2. UNIT-IV
DATA PREPARATION AND ANALYSIS
SYLLABUS
Data Preparation – Editing – Coding –Data entry – Validity of data – Qualitative
Vs Quantitative data analyses – Bivariate and Multivariate statistical techniques
– Factor analysis – Discriminant analysis – Cluster analysis – Multiple
regression and Correlation – Multidimensional scaling – Conjoint Analysis -
Application of statistical software for data analysis.
3. The data after collection has to be
Processed
Prepared and
Analyzed
4. • The collected data must undergo some processing before analysis.
Checking the questionnaire and schedules
Minimizing the errors
5. THE DATA PREPARATION PROCESS
DATA EDITING
DATA TABULATION
DATA CLASSIFICATION
DATA CODING
EXPLORATORY DATA ANALYSIS
6. • The Processing of data involves activities such as
Editing
Coding and
Tabulation of collected data
Editing
• Editing means inspecting, correcting and modifying the collected
data.
7. DATA CODING
The process of identifying and denoting a numeral to the
responses given by the respondent is called Coding.
8. SAMPLE CODE BOOK EXTRACT
Questio
n No.
Variable Name Coding Instruction
Symbol used for
variable name
1. Age
Less than 20 yrs = 1,
21 to 26 years = 2,
27 to 35 years = 3,
36 to 45 years = 4,
More than 45 years = 5
X1
2. Gender
Male = 1
Female = 2
X2
3. Marital status
Single = 1
Married = 2
Divorced/widow = 3
X3
4. Family size
One to two = 1,
Three to five = 2,
Six & more = 3
X4
10. Tabulation
• Tabulation is the summarization of results in the form of statistical
tables.
• The tabulation may be done entirely by manual methods or electronic
methods.
11. EXPLORATORY DATA ANALYSIS
Sample characteristics: age group of the sample
Age groups Frequency Percent
20-25 27 27.0
26-30 37 37.0
31-35 9 9.0
36-40 22 22.0
41-45 3 3.0
46 & above 2 2.0
Total 100 100.0
17. MEANING OF UNIVARIATE, BIVARIATE &
MULTIVARIATE ANALYSIS OF DATA
• Univariate Analysis – One variable is analyzed at a
time.
• Bivariate Analysis – Two variables are analyzed
together and examined for any possible association
between them.
• Multivariate Analysis – to analyze more than two
variables at a time.
19. 1. Correlation
• study of the linear relationship between two variables.
• Correlation analysis is the statistical tool used to describe the
degree to which one variable is linearly related to another.
23. DESCRIPTIVE ANALYSIS OF BIVARIATE
DATA
Refining an initial relationship:
The data reported below represents the relationship between consumption of
ice cream and income level.
The above table indicates that 55 per cent of high income respondents
fall into high consumption category as compared to 30 per cent of low
income respondents.
24. REGRESSION
• Regression is the determination of a statistical relationship
between two or more variables.
• One variable (defined as independent) is the cause of the
behaviour of the another one (defined as dependent variable)
• Impact of age, gender (the predictor variables (independent)) on
height (the dependent variable)
25. • The basic relationship between X and Y is
Y = a + bX
• It means that each unit change in X produces a change of b in Y.
26. MULTIPLE REGRESSION
• When there are two or more than two independent variables,
the analysis concerning relationship is known as multiple
regression
• Multiple regression equation assumes the form
Y = a + b1X1 + b2X2
where X1 and X2 are two independent variables and Y being
the dependent variable
27. TWO-WAY ANOVA
• The ANOVA (Analysis of Variance) technique is important in the context
of all those situations where we want to compare more than two
populations
28. • For example:
• Various types of drugs manufactured for curing a specific
disease may be studied and judged to be significant or not
through the application of ANOVA technique.
• The basic principle of ANOVA is to test whether the differences
occur due to ‘random effects’ or due to ‘specific factor’.
30. VARIABLES IN MULTIVARIATE ANALYSIS
Explanatory and criterion variable:
• If X may be considered to be the cause of Y, then X is described as
explanatory variable (also termed as causal or independent
variable) and Y is described as criterion variable (also termed as
resultant or dependent variable).
32. INTRODUCTION TO FACTOR ANALYSIS
• Factor analysis is a multivariate statistical technique in
which there is no distinction between dependent and
independent variables.
• The purpose of Factor analysis is data reduction and
summarization.
• It is a very useful method to reduce a large number of
variables resulting in data complexity to a few
manageable factors.
33. • For instance, we might have data, say, about an
individual’s income, education, occupation and
dwelling area and want to infer from these some factor
(such as social class) which summarises the
commonality of all the said four variables.
35. • Discriminant analysis enables the researchers to
classify persons or objects into two or more categories.
• For ex: consumers may be classified as heavy and light
users.
36. • A company discriminate the agents as high
performance agents and low performance agents
based on annual turnover of agents
• High performance and low performance – dependent
variable
• Annual turnover – independent variable
38. • The search of relatively homogeneous groups of
objects is called cluster analysis.
• In marketing cluster analysis is used to identify
persons with similar buying habits.
• It makes no difference between dependent and
independent variables.
39. • For ex: cluster analysis is illustrated by an example of
A to Z employees and their salary per month in a
company.
• A two dimensional perceptual map has been drawn on
the basis of data relating to (i) monthly expenditure of
the employees and (ii) monthly income of the
employees.
42. MULTIDIMENSIONAL SCALING (MDS)
BASIC TENETS
• MDS is only one of the techniques that can be used for perceptual
mapping.
• The inputs obtained could be for objects, individuals, brands,
corporations or countries.
• The grouped objects are usually evaluated and compared
with each other so that they can coexist on a spatial map.