Presented by Abhijeet Birari
UNIT V
ANALYSIS OF DATA
ANALYSIS OF DATA
Collection
of Data
Analysis
of Data
Draw
Logical
Inferences
STATISTICAL SOFTWARE PACKAGES
Follow the link >>>
http://www.coedu.usf.edu/main/departments/me/MeasurementandResearchStati
sticalSoftwarePackages.html
Statistical Packagefor SocialSciences
WHAT IS SPSS?
• SPSSStatisticsis asoftware package used for statistical analysis.
• S
P
S
Scan be used for:
– Processing Questionnaire
– Reporting in tables andgraphs
– Analyzing
• Mean, Median, Mode
• Mean Dev& Std. Dev.,
• Correlation & Regression,
• ChiSquare,T-Test,Z-test,ANOVA, MANOVA, FactorAnalysis, ClusterAnalysis, Multidimensional Scalingetc.
• Founded in 1968 and acquired by IBM in 2009.
WHAT IS HYPOTHESIS?
“The statement speculating the outcome of a research or experiment.”
• H0=There is no difference in performance of Div. A, B and C in Semester I
• Ha=Business Communication subject has been effective in developing communication skills of students
• H0=Biometric system has not improved the attendance of faculties
• Ha=Excessive fishing has affected marine life
• H0=There is no significant difference in salary of males and females in particular organization.
Here,
H0=Null Hypothesis
Ha=Alternate Hypothesis
WHAT IS LEVEL OF SIGNIFICANCE
When null hypothesis is true, you accept it.
When it is false, you reject it.
5% level of significance means you are taking 5% risk of rejecting null hypothesis when it
happens to be true.
It is the maximum value of probability of rejecting H0 when it is true.
TYPES OF STATISTICAL TESTS
Tests Meaning When it is used Statistical tests
used
Parametric
Tests
Based on assumption that
population from where the
sample is drawn is normally
distributed.
Used to test parameters
like mean, standard
deviation, proportions
etc.
• T-test
• ANOVA
• ANCOVA
• MANOVA
• Karl Pearson
Non
parametric
Tests
Don’t require assumption
regarding shape of
population distribution.
Used mostly for
categorical variable or in
case of small sample
size which violates
normality.
• Chi Square
• Mann-Whitney U
• Wilcoxon Signed Rank
• Kruskal-Wallis
• Spearman’s
ANOVA
(Analysis of Variance)
INTRODUCTION
• Significance of difference between means of two samples can bejudged using:
– Ztest (>30)
– Ttest (<30)
• Difficulty ariseswhile measuring difference between means of morethan 2samples
• ANOVAis usedin suchcases
• ANOVAis usedto test the significance of the difference between morethan two sample means and
to makeinferences aboutwhether our samples are drawnfrom population havingsame means
Significanceofdifferenceof IQof 2 divisions Ztest orTTest
Significanceof differencebetweenperformanceof 5 differenttypesofvehicles ANOVA
WHEN TO USE ANOVA?
Compare yield of crop from several variety of seeds
Mileage of 4 automobiles
Spending habits of five groups of students
Productivity of 4 different types of machine during a given period of time
Effectiveness of fitness programme on increase in stamina of 5 players
WHY ANOVA INSTEAD OF MULTIPLE T TEST?
• If more than two groups,why notjust doseveraltwo samplet-tests to compare the
meanfrom one group with the mean from each of the other groups?
• The problem with the multiple t-tests approach isthat asthe number of groups
increases,the number of two samplet-tests also increases.
• Asthe number of tests increasesthe probability of making aType I error also
increases.
ANOVA HYPOTHESES
• The Null hypothesis for ANOVAisthat the meansfor allgroups
are equal.
• TheAlternative hypothesis for ANOVAisthat at least two of
the meansare not equal.
ONE WAY ANOVA
AND
TWO WAY ANOVA
What is 1-way ANOVA and 2-way ANOVA?
Ifwe take only one factor and investigatethe difference among its various categories having numerous possible
values, it is called asOne-way ANOVA.
Incasewe investigatetwo factors at the sametime, then we useTwo-way ANOVA
•
•
TrainingType Productivity
Advanced 200
Advanced 193
Advanced 207
Intermediate 172
Intermediate 179
Intermediate 186
Beginners 130
Beginners 125
Beginners 119
One-wayANOVA
Gender Educational
Level
Marks
Male School 89
Male College 50
Male School 90
Male College 80
Female College 50
Female University 40
Female School 91
Female University 56
Two-wayANOVA
HOW ANOVA WORKS?
• Three methods usedto dissolve a powder in water are compared bythe time (in minutes) it
takes until the powder isfully dissolved. The results are summarized in the following table:
• It isthought that the population means of the three methods m1, m2and m3are not all
equal (i.e., at least one m is different from the others). How can this betested?
• Oneway isto use multiple two-sample t-tests and
• compare Method 1with Method 2,
• Method 1with Method 3 and
• Method 2with Method 3 (comparing all the pairs)
• But if eachtest is0.05,the probability of making aType 1error when runningthree tests would
increase.
• Better method isANOVA(analysis of variance)
• Thetechnique requiresthe analysis of different forms of variances– hencethe name.
Important:ANOVAis usedto showthat means are different and not variance are different.
• ANOVAcomparestwo types of variances
• Thevariance withineachsample and
• Thevariance between different samples.
• The blackdottedarrows showthe per-sample variation of the individual data points aroundthe
sample mean (the variancewithin).
• The red arrowsshowthe variation of the sample meansaroundthe grand mean (thevariance
between).
STEPS FOR USING ANOVA
Null Hypothesis H0: μ1= μ2= μ3
=………=μk
Alternate Hypothesis Ha : μ1≠ μ2≠ μ3≠ …
…
…≠ μk
1. Calculate meanof each sample (x
̄ 1, x
̄ 2, x
̄ 3……x
̄ k)
2. Calculate meanof sample means:
Where k=Total number samples
3. Calculate Sumof Square between the samples:
Where n1=Total number of item in sample 1
n2=Total number of item in sample 2
n3=Total number of item in sample 3 …
…
…
…
…
…
…
…
.
Step 1 :State NullandAlternate Hypothesis
Step2 :ComputeVariance Betweenthe samples
X
K
k
X1
X2
X3
....... X
SSbetween n1(x1 x) n2(x2 x) n3(x3 x) ...... nk(xk x)
2 2 2 2
1. Calculate Sumof Squarewithin the samples:
SSTotal=SSBetween+ SSWithin
Step3 :ComputeVarianceWithin samples
2 2 2 2
SSwithin i(x1i x1) i(x2i x2) i(x3i x3) .... i(xki xk)
Step4 :Calculatetotalvariance
Step5 :Calculateaveragevariance betweenandwithin
samples
k
SS Between
MSbetween
1
SSwithin
MSwithin
n k
N=Totalno of items in
all samples
K=Numberof samples
Step6 :Calculate F-ratio
within
between
MS
MS
Fratio
Step7 :Set upANOVAtable
Sourceof
variation
Sumof
squares(SS)
Degreeof
freedom (d.f)
MeanSquares F-Value
(Calculated)
Between
Samples
S
SBetween k-1 MSBetween=
S
SBetween/k-1
F=MSBetween/MS
Within
Within
Samples
S
SWithin n-k MSWithin=
S
SWithin/n-k
Total S
STotal n-1
Decision Rule: Reject H0if
Calculated value of F>Tabulated value of F
Otherwise accept H
Or
Accept H0if
Calculated value of F<Tabulated value of F
Otherwise reject H
0
0
Step8 : Lookfor Tablevalueof F
Steps:
1. Findout two degree of freedom (one for between and onefor
within)
2. Denote xfor between and yfor within [F(x,y)]
3. In F-distribution table, go along x columns, and down y rows.
Thepoint of intersection isyour tabulated F-ratio
EXAMPLE
• Set up ananalysis of variance table for the following per acre production
datafor three varieties of wheat, eachgrown on4 plots and state if the
variety differences are significant.
• Testat 5%level of significance
H0= The difference between varieties is not significant
Ha=The difference in varieties is significant
Interpretation:
Calculated Value of F<TableValue of F
∴Accept Null Hypothesis
Difference inwheatoutputdueto varieties isnotsignificantandisjusta matter of chance.
EXAMPLE
• Ranbaxy Ltd. has purchasedthree new machinesof different makesand
wishesto determine whether oneof them isfaster than the others in
producingacertain output.
• Four hourly productionfigures are observed at randomfrom each
machine andthe results are given below:
• UseANOVAand determine whether machinesare significantly different in
their meanspeed.
Observations M1 M2 M3
1 28 31 30
2 32 37 28
3 30 38 26
4 34 42 28
EXAMPLE
EXAMPLE
TWO WAY ANOVA
TWO WAY ANOVA
• Two-wayANOVAtechnique is usedwhenthe data are classified onthe basis of two factors.
• For example, the agricultural output may be classified onthe basis of different varieties of seedsand
also onthe basis of different varieties of fertilizers used.
• Twotypes of 2-wayANOVA
– Without repeated values
– With repeated values
STEPS IN 2-WAYANOVA
1
2
3
STEPS IN 2-WAYANOVA
SS for residual or error = total SS – (SS between columns + SS between rows)
4
5
6
STEPS IN 2-WAYANOVA
7
STEPS IN 2-WAYANOVA
PrepareANOVA Table
8
EXAMPLE
RESEARCH PROPOSAL
WHAT IS RESEARCH PROPOSAL?
Aresearch proposal is adocument that provides adetailed description of the intended
program. It is like an outline of the entire research processthat gives a reader a
summary of the information discussed in a project.
WHAT IS RESEARCH PROPOSAL?
• Research proposal sets out
– Broadtopic you want to research
– What is it trying to achieve?
– How would you do research?
– What would betime need?
– What results it might produce?
PURPOSE OF RESEARCH PROPOSAL
• Convince others that research is worth
• Sellyour idea to funding agency
• Convince the problem is significant and worth study
• Approach is new and yield results
ELEMENTS OF RESEARCH PROPOSAL
Introduction
Statement of Problem
Purposeof the Study
Reviewof Literature
Questionsand Hypothesis
The Design– Methods & Procedures
Limitationsof the Study
Significanceof the Study
References
FACTOR ANALYSIS
Colorof Bike
Look
Masculine/Feminine
Mileage
Price
Maintenance Cost
Power
Speed
Control
Weight
Brand
Easeof delivery
FinancialAssistance
Offer/Discounts Tyre size
Disc Brake
Smooth Handling
Service Centers
Design Cost Technical Comfort
FACTORS Unobserved
Observed
FACTOR ANALYSIS
“Factor analysis is astatistical method used to describe variability among
observed, correlated variables in terms of a potentially lower number of
unobserved variables called factors.”
EXAMPLE
Academicability of student
QuantitativeAbility VerbalAbility
1. MathsScore
2. ComputerProgram Score
3. PhysicsScore
4. AptitudeTestScore
1. English
2. Verbal ReasoningScore
PURPOSE OF FACTOR ANALYSIS
• Toidentify underlying constructs in the data.
• To reduce number of variables
• To reduce redundancy of data (E.g. Quantitative Aptitude)
APPLICATION OF FACTOR ANALYSIS
• Market Segmentation
• Product Research
• Advertising Studies
• Pricing Studies
Friendlinessof
Staff
TimeSpent in
Line-up
Assistancevia
Telephone
Service
Observed
Unobserved
X1 X2 X3
F1
X2
X1
a1
b1
X3
X4
F1
F2
F3
F4
c1
d1
WAYS OF FACTOR ANALYSIS
1. Confirmative FactorAnalysis
– Factors and corresponding variables are already known
– Onthe basis of literature review or past experience/expertise
2. Exploratory FactorAnalysis
– Algorithm is usedto explore pattern among variables
– Thenfactors are explored
– No prior hypothesisto start with
CONDITIONS FOR FACTOR ANALYSIS
• Use interval or ratio data
• Variables are related
• Sufficient number of variables (min 4-5 variables for one factor)
• Large no of observations
• All variables should be normally distributed
STEPS IN FACTOR ANALYSIS
Formulatethe Problem
Constructthe Correlation Matrix
Determinethe method of FactorAnalysis
Determine Numberof Factors
Estimatethe Factor Matrix
Rotatethe Factors
EstimatingPracticalSignificance
DISCRIMINANT ANALYSIS
EXAMPLE
• Basketballer or volleyballer on the basis of anthropometric variables.
• High or low performer on the basis of skill.
• Juniors or seniors category on the basis of the maturity parameters.
DEFINITION
“Discriminant analysis is a multivariate statistical technique used
for classifying aset of observations into pre defined groups.”
OBJECTIVE
• To understand group differences and to predict the likelihood
that a particular entity will belong to a particular class or group
basedon independent variables.
PURPOSE
• Toclassify asubject into one of the two groups on the basis of
some independent traits.
• Tostudy the relationship between group membership and the
variables usedto predict the group membership.
SITUATIONS FOR ITS USE
• When the dependent variable is dichotomous or multichotomous.
• Independent variables are metric, i.e. interval or ratio.
• Example:
• Basketballer or volleyballer on the basis of anthropometricvariables.
• Highor low performer onthe basis of skill.
• Juniors or seniors category onthe basis of the maturity parameters.
ASSUMPTIONS
1. Samplesize
– Should be at least five times the number of independent variables.
2. Normal distribution
– Eachof the independent variable is normally distributed.
3. Homogeneityof variances/ covariances
– All variables have linear and homoscedastic relationships.
ASSUMPTIONS
• Outliers
– Outliers should not be present in the data. DAis highly sensitive to the inclusion
of outliers.
• Non-multicollinearity
– There should be any correlation among the independent variables.
• Mutually exclusive
– Thegroups must be mutually exclusive,with every subject or case belonging to
only one group.
ASSUMPTIONS
• Variability
– No independent variables should have azerovariability in either of the groups
formed bythe dependent variable.
Toidentify the playersinto different categories during selection process.
CLUSTER ANALYSIS
DEFINITION
• “Cluster analysis is agroup of multivariate techniques whose primary purpose isto
group objects (e.g., respondents, products, or other entities) based on the
characteristicsthey possess.”
• It is a meansof grouping records based upon attributesthat makethem similar.
• If plotted geometrically,the objects within the clusters will be close together, while
the distance between clusters will befarther apart.
CLUSTER VS FACTOR ANALYSIS
Cluster analysis is about grouping subjects (e.g. people). Factoranalysis is about
grouping variables.
Cluster analysis is aform of categorization, whereas factor analysis is aform of
simplification.
In Cluster analysis, grouping is based on the distance (proximity), in Factoranalysis it
is based on variation (correlation)
EXAMPLE
• Suppose a marketing researcher wishes to determine market segments in a community based on
patterns of loyalty to brands and stores a small sample of seven respondents is selected as a pilot
test of how cluster analysis is applied. Two measures of loyalty- V1(store loyalty) and V2(brand
loyalty)- were measuredfor each respondents on 0-10scale.
HOW DO WE MEASURE SIMILARITY?
• Proximity Matrix of EuclideanDistance Between Observations
Observation
Observations
A B C D E F G
A
B
C
D
E
F
G
---
3.162
5.099
5.099
5.000
6.403
3.606
---
2.000
2.828
2.236
3.606
2.236
---
2.000
2.236
3.000
3.606
---
4.123
5.000
5.000
---
1.414
2.000
---
3.162 ---
HOW DO WE FORM CLUSTERS?
• Identify the two most similar(closest) observations not already in the samecluster and combine
them.
• Weapply this rule repeatedlyto generate a numberof cluster solutions, starting with each
observation as its own “cluster” andthen combiningtwo clusters at atime until all observations are
inasingle cluster.
• This processistermed a hierarchical procedure becauseit moves in astepwise fashionto form an
entire rangeof cluster solutions. It is also anagglomerative method becauseclusters areformed by
combiningexisting clusters.
AGGLOMERATIVEPROCESS CLUSTERSOLUTION
Step
Minimum Distance
Unclustered
Observationsa
Observation
Pair
Cluster Membership Numberof
Clusters
OverallSimilarity
Measure(Average
Within-Cluster
Distance)
Initial Solution
1.414
2.000
2.000
2.000
2.236
3.162
(A)(B)(C)(D)(E)(F)(G)
(A)(B)(C)(D)(E-F)(G)
(A)(B)(C)(D)(E-F-G)
(A)(B)(C-D)(E-F-G)
(A)(B-C-D)(E-F-G)
(A)(B-C-D-E-F-G)
(A-B-C-D-E-F-G)
1
2
3
4
5
6
E-F
E-G
C-D
B-C
B-E
A-B
7
6
5
4
3
2
1
0
1.414
2.192
2.144
2.234
2.896
3.420
• Dendogram:
Graphical representation (tree graph) of the results of a hierarchical procedure. Starting with each
object as a separate cluster, the dendogram shows graphically how the clusters are combined at
eachstep of the procedure until all are contained in asingle cluster
USAGE OF CLUSTER ANALYSIS
Market Segmentation:
Splitting customers into different groups/segments where customers havesimilar requirements.
Segmentingindustries/sectors:
Segmenting Markets:
Cities or regions having commontraits like population mix, infrastructure development, climatic
condition etc.
Career Planning:
Grouping people on the basis of educational qualification, experience, aptitude and aspirations.
Segmentingfinancialsectors/instruments:
Grouping according to raw material cost,financial allocation, seasonability etc.
CONJOINT ANALYSIS
EXAMPLE
MEANING
• Concerned with understanding how people makechoices between products or
services or
• Combination of product and service
• Businesses can design new products or services that better meet customers
underlying needs.
• Conjoint analysis is a popular marketing researchtechnique that marketers useto
determine what features a new product should have and how it should be priced.
• Supposewe want to market a new golf ball. We know from experience and from
talking with golfers that there arethree important product features:
1. Average Driving Distance
2. Average Ball Life
3. Price
TYPES OF CONJOINT ANALYSIS
1. ChoiceBased
– Respondentsselectfrom groupedoptions
TYPES OF CONJOINT ANALYSIS
2. Adaptive Choice
– It is usedfor studying how people makedecisions regarding complex products or services
– Packagesadapt basedon previous selections
– It gets ‘smarter’ asthe survey progresses
TYPES OF CONJOINT ANALYSIS
TYPES OF CONJOINT ANALYSIS
3. Menu-based
1. Respondentsare showna list of features
and levels
2. They haveto chooseamongoptions
3. Example:Airtel My Plan
TYPES OF CONJOINT ANALYSIS
4. Full profile rating based
– Display series of product profile
– Typically rated on likelihoodto purchase or
preferencescale
5. Selfexplicate
– Direct askof features and levels
– Eachfeature is presented separately
for evaluation
– Respondents rate all remaining
features accordingto desirability
ADVANTAGES
• Estimates psychological tradeoffs that consumers makewhen evaluating several
attributes together
• Measures preferences at the individual level
• Uncovers real or hidden drivers which may not be apparent to the respondent
themselves
• Realistic choice or shopping task
• Usedto develop needs based segmentation
DISADVANTAGES
• Designing conjoint studies can becomplex
• With too many options, respondents resort to simplification strategies
• Respondents are unable to articulate attitudes toward new categories
• Poorly designed studies mayover-value emotional/preference variables and
undervalue concrete variables
• Does not take into account the number items per purchase so it cangive a poor
reading of market share
MULTIDIMENSIONAL
SCALING
EXAMPLE
A researcher may give test subjects
several varieties of apple and have
them make comparisons on several
criteria between two apples at a time.
Once all the apples are directly
compared to each other variety, the
data is plotted on a graph that shows
how similar one type is to another.
MEANING
• Multidimensional scaling (MDS) is a meansof visualizing the level of similarity of
individual casesof adataset.
• Multidimensional scaling is a method usedto createcomparisons between things
that are difficult to compare.
• The end result of this process is generally atwo-dimensional chart that shows a level
of similarity between various items, all relative to one another.
APPLICATIONS OF MDS
• Understanding the position of brands in the marketplace relative to groups of
homogeneous consumers.
• Identifying new products by looking for white space opportunities or gaps.
• Gaugingthe effectiveness of advertising by identifying the brands position before
and after acampaign.
• Assessingthe attitudes and perceptions of consumers.
• Determine what attributes the brand owns and what attributes competitors own.
THANK YOU

data analysis in research.pptx

  • 1.
    Presented by AbhijeetBirari UNIT V ANALYSIS OF DATA
  • 2.
    ANALYSIS OF DATA Collection ofData Analysis of Data Draw Logical Inferences
  • 3.
    STATISTICAL SOFTWARE PACKAGES Followthe link >>> http://www.coedu.usf.edu/main/departments/me/MeasurementandResearchStati sticalSoftwarePackages.html
  • 4.
  • 5.
    WHAT IS SPSS? •SPSSStatisticsis asoftware package used for statistical analysis. • S P S Scan be used for: – Processing Questionnaire – Reporting in tables andgraphs – Analyzing • Mean, Median, Mode • Mean Dev& Std. Dev., • Correlation & Regression, • ChiSquare,T-Test,Z-test,ANOVA, MANOVA, FactorAnalysis, ClusterAnalysis, Multidimensional Scalingetc. • Founded in 1968 and acquired by IBM in 2009.
  • 7.
    WHAT IS HYPOTHESIS? “Thestatement speculating the outcome of a research or experiment.” • H0=There is no difference in performance of Div. A, B and C in Semester I • Ha=Business Communication subject has been effective in developing communication skills of students • H0=Biometric system has not improved the attendance of faculties • Ha=Excessive fishing has affected marine life • H0=There is no significant difference in salary of males and females in particular organization. Here, H0=Null Hypothesis Ha=Alternate Hypothesis
  • 8.
    WHAT IS LEVELOF SIGNIFICANCE When null hypothesis is true, you accept it. When it is false, you reject it. 5% level of significance means you are taking 5% risk of rejecting null hypothesis when it happens to be true. It is the maximum value of probability of rejecting H0 when it is true.
  • 9.
    TYPES OF STATISTICALTESTS Tests Meaning When it is used Statistical tests used Parametric Tests Based on assumption that population from where the sample is drawn is normally distributed. Used to test parameters like mean, standard deviation, proportions etc. • T-test • ANOVA • ANCOVA • MANOVA • Karl Pearson Non parametric Tests Don’t require assumption regarding shape of population distribution. Used mostly for categorical variable or in case of small sample size which violates normality. • Chi Square • Mann-Whitney U • Wilcoxon Signed Rank • Kruskal-Wallis • Spearman’s
  • 10.
  • 11.
    INTRODUCTION • Significance ofdifference between means of two samples can bejudged using: – Ztest (>30) – Ttest (<30) • Difficulty ariseswhile measuring difference between means of morethan 2samples • ANOVAis usedin suchcases • ANOVAis usedto test the significance of the difference between morethan two sample means and to makeinferences aboutwhether our samples are drawnfrom population havingsame means Significanceofdifferenceof IQof 2 divisions Ztest orTTest Significanceof differencebetweenperformanceof 5 differenttypesofvehicles ANOVA
  • 12.
    WHEN TO USEANOVA? Compare yield of crop from several variety of seeds Mileage of 4 automobiles Spending habits of five groups of students Productivity of 4 different types of machine during a given period of time Effectiveness of fitness programme on increase in stamina of 5 players
  • 13.
    WHY ANOVA INSTEADOF MULTIPLE T TEST? • If more than two groups,why notjust doseveraltwo samplet-tests to compare the meanfrom one group with the mean from each of the other groups? • The problem with the multiple t-tests approach isthat asthe number of groups increases,the number of two samplet-tests also increases. • Asthe number of tests increasesthe probability of making aType I error also increases.
  • 14.
    ANOVA HYPOTHESES • TheNull hypothesis for ANOVAisthat the meansfor allgroups are equal. • TheAlternative hypothesis for ANOVAisthat at least two of the meansare not equal.
  • 15.
  • 16.
    What is 1-wayANOVA and 2-way ANOVA? Ifwe take only one factor and investigatethe difference among its various categories having numerous possible values, it is called asOne-way ANOVA. Incasewe investigatetwo factors at the sametime, then we useTwo-way ANOVA • • TrainingType Productivity Advanced 200 Advanced 193 Advanced 207 Intermediate 172 Intermediate 179 Intermediate 186 Beginners 130 Beginners 125 Beginners 119 One-wayANOVA Gender Educational Level Marks Male School 89 Male College 50 Male School 90 Male College 80 Female College 50 Female University 40 Female School 91 Female University 56 Two-wayANOVA
  • 17.
    HOW ANOVA WORKS? •Three methods usedto dissolve a powder in water are compared bythe time (in minutes) it takes until the powder isfully dissolved. The results are summarized in the following table: • It isthought that the population means of the three methods m1, m2and m3are not all equal (i.e., at least one m is different from the others). How can this betested?
  • 18.
    • Oneway istouse multiple two-sample t-tests and • compare Method 1with Method 2, • Method 1with Method 3 and • Method 2with Method 3 (comparing all the pairs) • But if eachtest is0.05,the probability of making aType 1error when runningthree tests would increase. • Better method isANOVA(analysis of variance) • Thetechnique requiresthe analysis of different forms of variances– hencethe name. Important:ANOVAis usedto showthat means are different and not variance are different.
  • 19.
    • ANOVAcomparestwo typesof variances • Thevariance withineachsample and • Thevariance between different samples. • The blackdottedarrows showthe per-sample variation of the individual data points aroundthe sample mean (the variancewithin). • The red arrowsshowthe variation of the sample meansaroundthe grand mean (thevariance between).
  • 20.
    STEPS FOR USINGANOVA Null Hypothesis H0: μ1= μ2= μ3 =………=μk Alternate Hypothesis Ha : μ1≠ μ2≠ μ3≠ … … …≠ μk 1. Calculate meanof each sample (x ̄ 1, x ̄ 2, x ̄ 3……x ̄ k) 2. Calculate meanof sample means: Where k=Total number samples 3. Calculate Sumof Square between the samples: Where n1=Total number of item in sample 1 n2=Total number of item in sample 2 n3=Total number of item in sample 3 … … … … … … … … . Step 1 :State NullandAlternate Hypothesis Step2 :ComputeVariance Betweenthe samples X K k X1 X2 X3 ....... X SSbetween n1(x1 x) n2(x2 x) n3(x3 x) ...... nk(xk x) 2 2 2 2
  • 21.
    1. Calculate SumofSquarewithin the samples: SSTotal=SSBetween+ SSWithin Step3 :ComputeVarianceWithin samples 2 2 2 2 SSwithin i(x1i x1) i(x2i x2) i(x3i x3) .... i(xki xk) Step4 :Calculatetotalvariance Step5 :Calculateaveragevariance betweenandwithin samples k SS Between MSbetween 1 SSwithin MSwithin n k N=Totalno of items in all samples K=Numberof samples
  • 22.
    Step6 :Calculate F-ratio within between MS MS Fratio Step7:Set upANOVAtable Sourceof variation Sumof squares(SS) Degreeof freedom (d.f) MeanSquares F-Value (Calculated) Between Samples S SBetween k-1 MSBetween= S SBetween/k-1 F=MSBetween/MS Within Within Samples S SWithin n-k MSWithin= S SWithin/n-k Total S STotal n-1
  • 23.
    Decision Rule: RejectH0if Calculated value of F>Tabulated value of F Otherwise accept H Or Accept H0if Calculated value of F<Tabulated value of F Otherwise reject H 0 0 Step8 : Lookfor Tablevalueof F Steps: 1. Findout two degree of freedom (one for between and onefor within) 2. Denote xfor between and yfor within [F(x,y)] 3. In F-distribution table, go along x columns, and down y rows. Thepoint of intersection isyour tabulated F-ratio
  • 24.
    EXAMPLE • Set upananalysis of variance table for the following per acre production datafor three varieties of wheat, eachgrown on4 plots and state if the variety differences are significant. • Testat 5%level of significance
  • 25.
    H0= The differencebetween varieties is not significant Ha=The difference in varieties is significant
  • 26.
    Interpretation: Calculated Value ofF<TableValue of F ∴Accept Null Hypothesis Difference inwheatoutputdueto varieties isnotsignificantandisjusta matter of chance.
  • 27.
    EXAMPLE • Ranbaxy Ltd.has purchasedthree new machinesof different makesand wishesto determine whether oneof them isfaster than the others in producingacertain output. • Four hourly productionfigures are observed at randomfrom each machine andthe results are given below: • UseANOVAand determine whether machinesare significantly different in their meanspeed. Observations M1 M2 M3 1 28 31 30 2 32 37 28 3 30 38 26 4 34 42 28
  • 28.
  • 29.
  • 30.
  • 31.
    TWO WAY ANOVA •Two-wayANOVAtechnique is usedwhenthe data are classified onthe basis of two factors. • For example, the agricultural output may be classified onthe basis of different varieties of seedsand also onthe basis of different varieties of fertilizers used. • Twotypes of 2-wayANOVA – Without repeated values – With repeated values
  • 32.
  • 33.
    STEPS IN 2-WAYANOVA SSfor residual or error = total SS – (SS between columns + SS between rows) 4 5 6
  • 34.
  • 35.
  • 36.
  • 39.
  • 40.
    WHAT IS RESEARCHPROPOSAL? Aresearch proposal is adocument that provides adetailed description of the intended program. It is like an outline of the entire research processthat gives a reader a summary of the information discussed in a project.
  • 41.
    WHAT IS RESEARCHPROPOSAL? • Research proposal sets out – Broadtopic you want to research – What is it trying to achieve? – How would you do research? – What would betime need? – What results it might produce?
  • 42.
    PURPOSE OF RESEARCHPROPOSAL • Convince others that research is worth • Sellyour idea to funding agency • Convince the problem is significant and worth study • Approach is new and yield results
  • 43.
    ELEMENTS OF RESEARCHPROPOSAL Introduction Statement of Problem Purposeof the Study Reviewof Literature Questionsand Hypothesis The Design– Methods & Procedures Limitationsof the Study Significanceof the Study References
  • 44.
  • 45.
    Colorof Bike Look Masculine/Feminine Mileage Price Maintenance Cost Power Speed Control Weight Brand Easeofdelivery FinancialAssistance Offer/Discounts Tyre size Disc Brake Smooth Handling Service Centers Design Cost Technical Comfort FACTORS Unobserved Observed
  • 46.
    FACTOR ANALYSIS “Factor analysisis astatistical method used to describe variability among observed, correlated variables in terms of a potentially lower number of unobserved variables called factors.”
  • 47.
    EXAMPLE Academicability of student QuantitativeAbilityVerbalAbility 1. MathsScore 2. ComputerProgram Score 3. PhysicsScore 4. AptitudeTestScore 1. English 2. Verbal ReasoningScore
  • 48.
    PURPOSE OF FACTORANALYSIS • Toidentify underlying constructs in the data. • To reduce number of variables • To reduce redundancy of data (E.g. Quantitative Aptitude)
  • 49.
    APPLICATION OF FACTORANALYSIS • Market Segmentation • Product Research • Advertising Studies • Pricing Studies
  • 50.
  • 51.
  • 52.
    WAYS OF FACTORANALYSIS 1. Confirmative FactorAnalysis – Factors and corresponding variables are already known – Onthe basis of literature review or past experience/expertise 2. Exploratory FactorAnalysis – Algorithm is usedto explore pattern among variables – Thenfactors are explored – No prior hypothesisto start with
  • 53.
    CONDITIONS FOR FACTORANALYSIS • Use interval or ratio data • Variables are related • Sufficient number of variables (min 4-5 variables for one factor) • Large no of observations • All variables should be normally distributed
  • 54.
    STEPS IN FACTORANALYSIS Formulatethe Problem Constructthe Correlation Matrix Determinethe method of FactorAnalysis Determine Numberof Factors Estimatethe Factor Matrix Rotatethe Factors EstimatingPracticalSignificance
  • 55.
  • 56.
    EXAMPLE • Basketballer orvolleyballer on the basis of anthropometric variables. • High or low performer on the basis of skill. • Juniors or seniors category on the basis of the maturity parameters.
  • 57.
    DEFINITION “Discriminant analysis isa multivariate statistical technique used for classifying aset of observations into pre defined groups.”
  • 58.
    OBJECTIVE • To understandgroup differences and to predict the likelihood that a particular entity will belong to a particular class or group basedon independent variables.
  • 59.
    PURPOSE • Toclassify asubjectinto one of the two groups on the basis of some independent traits. • Tostudy the relationship between group membership and the variables usedto predict the group membership.
  • 60.
    SITUATIONS FOR ITSUSE • When the dependent variable is dichotomous or multichotomous. • Independent variables are metric, i.e. interval or ratio. • Example: • Basketballer or volleyballer on the basis of anthropometricvariables. • Highor low performer onthe basis of skill. • Juniors or seniors category onthe basis of the maturity parameters.
  • 61.
    ASSUMPTIONS 1. Samplesize – Shouldbe at least five times the number of independent variables. 2. Normal distribution – Eachof the independent variable is normally distributed. 3. Homogeneityof variances/ covariances – All variables have linear and homoscedastic relationships.
  • 62.
    ASSUMPTIONS • Outliers – Outliersshould not be present in the data. DAis highly sensitive to the inclusion of outliers. • Non-multicollinearity – There should be any correlation among the independent variables. • Mutually exclusive – Thegroups must be mutually exclusive,with every subject or case belonging to only one group.
  • 63.
    ASSUMPTIONS • Variability – Noindependent variables should have azerovariability in either of the groups formed bythe dependent variable.
  • 64.
    Toidentify the playersintodifferent categories during selection process.
  • 66.
  • 67.
    DEFINITION • “Cluster analysisis agroup of multivariate techniques whose primary purpose isto group objects (e.g., respondents, products, or other entities) based on the characteristicsthey possess.” • It is a meansof grouping records based upon attributesthat makethem similar. • If plotted geometrically,the objects within the clusters will be close together, while the distance between clusters will befarther apart.
  • 68.
    CLUSTER VS FACTORANALYSIS Cluster analysis is about grouping subjects (e.g. people). Factoranalysis is about grouping variables. Cluster analysis is aform of categorization, whereas factor analysis is aform of simplification. In Cluster analysis, grouping is based on the distance (proximity), in Factoranalysis it is based on variation (correlation)
  • 69.
    EXAMPLE • Suppose amarketing researcher wishes to determine market segments in a community based on patterns of loyalty to brands and stores a small sample of seven respondents is selected as a pilot test of how cluster analysis is applied. Two measures of loyalty- V1(store loyalty) and V2(brand loyalty)- were measuredfor each respondents on 0-10scale.
  • 71.
    HOW DO WEMEASURE SIMILARITY? • Proximity Matrix of EuclideanDistance Between Observations Observation Observations A B C D E F G A B C D E F G --- 3.162 5.099 5.099 5.000 6.403 3.606 --- 2.000 2.828 2.236 3.606 2.236 --- 2.000 2.236 3.000 3.606 --- 4.123 5.000 5.000 --- 1.414 2.000 --- 3.162 ---
  • 72.
    HOW DO WEFORM CLUSTERS? • Identify the two most similar(closest) observations not already in the samecluster and combine them. • Weapply this rule repeatedlyto generate a numberof cluster solutions, starting with each observation as its own “cluster” andthen combiningtwo clusters at atime until all observations are inasingle cluster. • This processistermed a hierarchical procedure becauseit moves in astepwise fashionto form an entire rangeof cluster solutions. It is also anagglomerative method becauseclusters areformed by combiningexisting clusters.
  • 73.
    AGGLOMERATIVEPROCESS CLUSTERSOLUTION Step Minimum Distance Unclustered Observationsa Observation Pair ClusterMembership Numberof Clusters OverallSimilarity Measure(Average Within-Cluster Distance) Initial Solution 1.414 2.000 2.000 2.000 2.236 3.162 (A)(B)(C)(D)(E)(F)(G) (A)(B)(C)(D)(E-F)(G) (A)(B)(C)(D)(E-F-G) (A)(B)(C-D)(E-F-G) (A)(B-C-D)(E-F-G) (A)(B-C-D-E-F-G) (A-B-C-D-E-F-G) 1 2 3 4 5 6 E-F E-G C-D B-C B-E A-B 7 6 5 4 3 2 1 0 1.414 2.192 2.144 2.234 2.896 3.420
  • 75.
    • Dendogram: Graphical representation(tree graph) of the results of a hierarchical procedure. Starting with each object as a separate cluster, the dendogram shows graphically how the clusters are combined at eachstep of the procedure until all are contained in asingle cluster
  • 76.
    USAGE OF CLUSTERANALYSIS Market Segmentation: Splitting customers into different groups/segments where customers havesimilar requirements. Segmentingindustries/sectors: Segmenting Markets: Cities or regions having commontraits like population mix, infrastructure development, climatic condition etc. Career Planning: Grouping people on the basis of educational qualification, experience, aptitude and aspirations. Segmentingfinancialsectors/instruments: Grouping according to raw material cost,financial allocation, seasonability etc.
  • 77.
  • 78.
  • 79.
    MEANING • Concerned withunderstanding how people makechoices between products or services or • Combination of product and service • Businesses can design new products or services that better meet customers underlying needs. • Conjoint analysis is a popular marketing researchtechnique that marketers useto determine what features a new product should have and how it should be priced.
  • 80.
    • Supposewe wantto market a new golf ball. We know from experience and from talking with golfers that there arethree important product features: 1. Average Driving Distance 2. Average Ball Life 3. Price
  • 81.
    TYPES OF CONJOINTANALYSIS 1. ChoiceBased – Respondentsselectfrom groupedoptions
  • 82.
    TYPES OF CONJOINTANALYSIS 2. Adaptive Choice – It is usedfor studying how people makedecisions regarding complex products or services – Packagesadapt basedon previous selections – It gets ‘smarter’ asthe survey progresses
  • 83.
  • 84.
    TYPES OF CONJOINTANALYSIS 3. Menu-based 1. Respondentsare showna list of features and levels 2. They haveto chooseamongoptions 3. Example:Airtel My Plan
  • 85.
  • 86.
    4. Full profilerating based – Display series of product profile – Typically rated on likelihoodto purchase or preferencescale
  • 87.
    5. Selfexplicate – Directaskof features and levels – Eachfeature is presented separately for evaluation – Respondents rate all remaining features accordingto desirability
  • 88.
    ADVANTAGES • Estimates psychologicaltradeoffs that consumers makewhen evaluating several attributes together • Measures preferences at the individual level • Uncovers real or hidden drivers which may not be apparent to the respondent themselves • Realistic choice or shopping task • Usedto develop needs based segmentation
  • 89.
    DISADVANTAGES • Designing conjointstudies can becomplex • With too many options, respondents resort to simplification strategies • Respondents are unable to articulate attitudes toward new categories • Poorly designed studies mayover-value emotional/preference variables and undervalue concrete variables • Does not take into account the number items per purchase so it cangive a poor reading of market share
  • 90.
  • 91.
    EXAMPLE A researcher maygive test subjects several varieties of apple and have them make comparisons on several criteria between two apples at a time. Once all the apples are directly compared to each other variety, the data is plotted on a graph that shows how similar one type is to another.
  • 92.
    MEANING • Multidimensional scaling(MDS) is a meansof visualizing the level of similarity of individual casesof adataset. • Multidimensional scaling is a method usedto createcomparisons between things that are difficult to compare. • The end result of this process is generally atwo-dimensional chart that shows a level of similarity between various items, all relative to one another.
  • 94.
    APPLICATIONS OF MDS •Understanding the position of brands in the marketplace relative to groups of homogeneous consumers. • Identifying new products by looking for white space opportunities or gaps. • Gaugingthe effectiveness of advertising by identifying the brands position before and after acampaign. • Assessingthe attitudes and perceptions of consumers. • Determine what attributes the brand owns and what attributes competitors own.
  • 95.