SlideShare a Scribd company logo
1
STUDENT EXAMINATION NUMBER Y1401956
MODULE NO: MAN00029M
MODULE TITLE: Quantitative Methods & Data Analysis
Module Tutor: Dr. Harry Venables
Essay Title: Final assessment
Word Count: ___2688_________
2
Task 1
In order to start performing any manipulation with data the Data View and
Variable View in SPSS should comply with the rules so that SPSS output would
compute properly.
Name Label Values Type New
type
Rationale
Obs Id number None Nominal Scale Variable items that are not
measurable but are
numeric like ID numbers
and phone numbers (can
also be Nominal).
Gender Gender 0=Female,
1=Male
Nominal Nominal Variable items are all
numbers that represent
categories and have no
order to them, e.g. 1-Blue
Car, 2-Cat, 0-Male, 11-
Female, etc.
Age Age(years) None Nominal Scale Variable items are all
measurable numbers e.g.
height in cm.
Status Marital Status 1=Single,
2=Married,3=
Divorced,
4=Widowed
Nominal Nominal Variable items are all
numbers that represent
categories and have no
order to them, e.g. 1-Blue
3
Car, 2-Cat, 0-Male, 11-
Female, etc.
Occupa
tion
Occupation 1=Student,2
=Employed,
3=Self-
employed,
4=Retired
Nominal Nominal Variable items are all
numbers that represent
categories and have no
order to them, e.g. 1-Blue
Car, 2-Cat, 0-Male, 11-
Female, etc.
AvgMon
thlySpe
nding
AverageMonthly
Spend (GBP)
None Nominal Scale
(custom
currency
)
Variable items are all
measurable monetary
values.
Monthly
Visits
Number of Monthly
visits
None Nominal Scale Variable items are all
measurable numbers e.g.
height in cm.
Distanc
e
Distance Travelled
(miles)
None Nominal Scale Variable items are all
measurable numbers e.g.
height in cm.
Car Vehicle Ownership 0=No,1=Yes Nominal Nominal Variable items are all
numbers that represent
categories and have no
order to them, e.g. 1-Blue
Car, 2-Cat, 0-Male, 11-
Female, etc.
Appreci Customer 1=Very Low, Nominal Ordinal Variable items are
4
ation Appreciation 2=Low,
3=Indifferent,
4=High,
5=Very High.
numbers that represent
some form of ranking or
order, e.g. Likert scale
values 1-5, 1-7.
Task 2
A. The bar chart indicates the target consumers of the FreshCo retail centre and
consumer’s two characteristics are analysed: status and occupation. So, cross-
tabulation (Table 2.1) is used in order to analyse two variables and produce an
appropriate bar chart.
Table 2.1
Table 2.2
Table 2.2 shows that there are 201 repondents and 2 modes.
5
From this chart a conclusion could be drawn that the majority of FreshCo’s
cosnumers are employed (116 out of 201 repsondents) and married ( 89 out of 201
reposndents) . Thus, Bimodal attribute is married and employed because of occuring
most frequently ( appendix 2)( Field,2009:21).
B. The target-consumer analysis contains previous charachteristics such as status
and occupation and the differeneces between them but in regard to car
owership.
6
7
Divorced that are either employed or self-employed and widowed people that
are retired are groups that don’t own a car. Divorced people,especially self-
employed sub-group is the largests group that doesn’t own a car. Employed and
married on the contrary is the largest group to own a car. Single students are the
second largest and single employed is the smallest group to own a car.
C. Considering the fact the majority of FreshCo’s cutomers are married employed
car owners potential issues such as enough numbers of parking slots could
arise. Also, the retial’s convenient opening hours could make a significant
difference for working idividuals. Marrital status can also indicate the presence of
children and need for children facilities such as playgrounds and food courts on
the site.
8
Task 3
A. Consumer spending in regard to consumer charachteristics.
Extreme values (outliers) occur for student males and self - employed females.
Outliers are the extreme values that deviate from the rest of the responses. In this
case three respondents have outstanding answers on the average monthly
spending. The numbers over outliners indicate the row - number of the respondent
(SAGE, 2015). In the self-employed female group one person has higher monthly
spendings (522.59 $) than the rest. In the male student group two respondents
spend more than the rest of the group (219.84 $ and 225.84 $).
Medians are dispersed in terms of occupation. Whereas, in terms of gender
medians are not significantly different (they overlap). Both employed and self-
9
employed males and females spend more money than other groups( the difference
is significant because their confidence intervals don’t overlap). There is also some
difference between the employed and self-employed group because boxes of these
groups almost don’t overlap (with employed and self-employed women there is less
difference in spending because the boxes slightly overlap). Lower median position
shows lower spending for the self-employed group than employed group that has
higher median position. The interquartile ranges are of slightly different length and
have different positions which indicates different dispersion of data between the two
groups (self-employed group is smaller) (Field, 2009:100-2).
There are no significant differences between expenditure of students and
retired individuals because their boxes overlap. However, the spread of student
interquartile range differs across gender. The male group is smaller and less likely to
spend more money than women (Ibid).
B. Average monthly expenditure according to level of appreciation.
10
The average expenditure for customers with ‘very high’ customer appreciation
differs significantly from the rest because the median and the box (incl. confidence
intervals) are far away from the rest and don’t overlap. Interquartile range of the ‘very
low’ and ‘low’ appreciation is very different from all the rest which indicates a wider
dispersion of data (Field, 2009:101-2).
Some box plots show the skewness of data and lack of symmetry which
needs to be observed more closely through a statistical test.
11
12
Descriptive test shows the means as well as medians of ‘low’, ‘very low’,
‘indifferent’ and ‘high’ appreciation are not significantly different from each other.
However, the mean (as well as median- 199.9549) of ‘very high’ appreciation is
significantly lower (197.8941) and differs from the rest.
Standard deviation from the mean also differs for ‘very low’ and ‘low’ which
numerically shows a wider variety of indicators deviating from the mean. The
interquartile range of these both groups also significantly differs from the rest.
We can also observe slight positive skewness for ‘high’ and ‘indifferent’ groups and
slight negative skewness for the rest that indicates slight asymmetry of the data
distribution (Field, 2009:19).
Customers that tend to spend the least amount have the highest customer
appreciation. Customers that spend the most are indifferent or have low or very low
appreciation.
Task 4: Distribution of customer’s monthly expenditure.
13
According to the histogram the data for average monthly spending is not
normally distributed. We observe a flat distribution with a negative skew. The bars
are out from the normal curve and have an obvious split in two. In a Normal Q-Q Plot
we see some deviation towards the tail. Normal QQ-Plot is a chart of the observed
values plotted against normalized expected values. The data values are pretty far
away from the line and even cross it which shows that distribution is not normal. The
spending data only around 180.000 spending value and 5300.00 spending value is
normally distributed. Generally, values don’t follow the normal distribution. Detrended
14
plot is another view of the first that detrends the line. It shows even more closely the
abnormality of distribution.
The box plot doesn’t show any outliers. Central section of the data is not
centrally distributed because the median is not centrally placed.
Distribution of distance travelled
The Distance travelled data seems more normally distributed. However, if we
see the Normal Q-Q Plot than we can see slight deviation towards the end of the tail.
Detrended Normal Q-Q plot shows a closer look which reveals that the data is not as
15
normally distributed as it looks like. The box plot has three outliers. Median is almost
centrally placed, so, central section of data is almost centrally distributed.
The normality of distribution is hard to indicate without carrying out the test of
normality.
From the table we can spot skewness which indicates abnormal distribution in
both cases. In the first case more (-.900) than in the other (.612).
16
According to Shapiro-Wilk test (which is more reliable), Sig. (p < than
0.05) shows that the data in both average monthly spending and distance travelled is
not normally distributed. The null hypothesis here is that the data that is given has no
difference from that of the normal distribution. The hypothesis test rejects it.
Significance p-value is less than level of significance. In this case .000 and .001 are
smaller than 5% (0.05); therefore, the null hypothesis is rejected and we conclude
that the data is not normally distributed and that Distance travelled has less of
normality deviation than the monthly spending data.
Task 5: Significance of age-gender difference.
a. It is assumed that the data is normally distributed, which suggests a
parametric test in a form of a T-test. T-test is used when there are “two
experimental conditions and different participants used in each condition” (Field,
2009:334).
The null hypothesis (H0) is that there is no significant differences
between the age and gender variables. Alternative hypothesis (H1) would be that
there is a significant differences between same variables.
P-value indicates level of probability at which we accept or reject the
hypothesis (Ibid). P value has to be linked to the direction of hypothesis we are
testing. If probability p < 0.05 (5%) it means that the H0 is rejected and the variances
have significant difference. If p> 0.05 (5%) the H0 is not rejected. After the analysis
of variances the second step involves the analysis of means. If the previous test
doesn’t show significant differences and we do not reject H0 of the previous test then
we should look at the first row of the Independent Sample Test.
B.
17
There was 126 female respondents and 75 male respondents. According
to group statistics males have higher age average (41.53 years old) than women
(37.65).
In order to carry out the analysis of the test and see if variances are
different in different groups we should look at the Levene’s test for the p-value (Sig.).
In this case p= .248 > 0.05 (5%); so, we accept (or rather not reject) the null
hypothesis (H0). Thus, there is no significant difference between the variances of the
groups. Accordingly, we look at the first row (Equal variances assumed) of the T-test
for equality means. Second row is disregarded (Field, 2009:340). P=.000 < 5% (Sig.
2-tailed); so, we reject the null hypothesis for the mean variable. This means that
there is a difference in the mean between the groups, so, we have to look at the
mean differences row.
To conclude, there is significant difference in the mean but not in the
variance. Significance measure shows that there is a difference in average age for
different sexes. The mean difference is negative which means group 2 (males) is the
largest group.
18
The normality test was also carried out to support the T-test and reveal
detailed data on age average across sexes and differences between these
averages. The test below supports the rationale behind choosing the parametric test
over non-parametric test due to normality of distribution.
19
The normality table supports the assumption that the data is normally distributed
(and that the T-test is appropriate). H0 is that data has no significant difference from
normal distribution. P values for both males and females are bigger than 5%.
(p=.511> 0.05 and p=.135>0.05) which ensures the normality of distribution. Charts
below also support the perfect normality of distribution which means that T-test was
used correctly.
20
21
Task 6
The task investigates the customers feedback connected to the
customer’s level of appreciation. It also compares the level of appreciation across
different genders of consumers. Appropriate test of association would be Chi-square
test ( for two or more samples) that is used when one group is dependent on the
other in order to measure relationship between the attribute variable (investigates
relationship among attribute variables, usually nominal and ordinal variables that can
be grouped or ranked) (Venables,2015,w3 p3).
22
Firstly, because we have two unrelated samples we need to make a
Crosstabs table and indicate the null hypothesis (H0) and an alternative hypothesis
(H1).
H0 - would be that customer appreciation does not depend on gender (gender
influences on customer appreciation level).
H1 - would be that customer appreciation depends on gender.
Count or observed frequency are results from variable groups. Expected
count or frequency is calculated in the table by using row and columns totals.
Expected frequencies in each cell have to be higher than 5 to avoid misleading
results, so there would be no issues in the count (Bryman, Cramer,2009:155).
In the table above standardised risiduals are within +/- 3 gap which
shows reliability of the test and its normality.
Accroding to the table, however, it is hard to tell the customer
appreciation level depending on gender because the number of female reposndents
is higher (126 total) than of male respondents (75 total). So, the dependancy is not
evident without the Chi-square.
23
Looking at the Pearson Chi-square test P= .505 > 0.05 (5%), so, we do
not reject the H0 and conclude that customer appreciation does not depend on
gender. These two groups are independent of one another.
Task 7: Customer behaviour
A. Measuring variables against each other.
24
Correlation indicates the direction and strenght of the reltionship between
variables. It shows interdependence of variables and observes direct, null and
inverse relationships.Each point represents respondents position in relation to the
two varibales being measured (Bryman, cramer, 2009: 212).
In this matrix plot we can see the majority of scattered patterns with
random distribution and some weak form of correlation except one case with an
obvious inverse curvilinear and negative relationship (Bryman, cramer,2009:215).
The diagonal with no data are values against themselves which indicates perfect
correlation (where p=0). If we look at the lower triangle ( which mirros the upper
triangle) we can see potentially strong correlation between Distance Travelled and
Number of Monthly Visits data because the scatter is very close. The rest of data has
25
random patterns and distribution without any direction which indicates weak
correlation or lack of such. To conclude, from SPSS test we can interpret that with
the decrease of distance travelled there is an increase of monthly visits.
B. Before applying the Pearson’ Correlation test we should make sure that
the graph is linear because according to the scattered matrix plot the two
variables (Monthly Visits and Distance Travelled) have a curvilinear relationship
(shape of the relationship is not straight and curves at some point), so it is non-
linear; thus, “it is not appropriate to apply a measure of linear correlation like
Pearson’s r” (Bryman, Crymer, 2009: 214).
In order to use Pearson test the correlation should be linear and the two
selected variables should be normally distributed. Firstly, we need to transform
an independent variable into a logarithmic scale to perform a valid Pearson
correlation test (Ibid) and test the assumption of normal distribution. Otherwise,
the outcome would be insignificant and could show errors.
Testing normality (appendix 2)
According to Schapiro-Wilk test of normality where p < 0.05 shows the
data on both average distance travelled (Sig. = .001) and number of monthly
visits (Sig. = .000) is not normally distributed. This rejects the null hypothesis
26
which states that the data that is given has no difference from that of the
normal distribution.
In this case two variables are not linear or normally distributed. Despite
the adjustment of transformation of logarithmic scale the test might not
provide a meaningful outcome.
Performing linear correlation
In order to measure correlation we have to explore covariance that
indicates how variables vary together. Pearson’s correlation coefficient (P) describes
covariance. If P=1, then it means that there is absolute positive correlation between
the two variables x and y. If P<0 then, there is a direct relationship between the
variables x and y. If P=0 then there is no direct relationship between variables; and if
P<0 then there is an inverse relationship present between the variables. We use a
Pearson test also because it’s a continuous data (Venables, 2015, l4, p2).
As we can observe in the table by looking at P there is an inverse
relationship between the two variables since P<0. They also have a strong negative
relationship because p= - .871 (close to -1) (Bryman, Cramer, 2009:217).
27
Regressions
Regression analyzes the cause-effect relationship between multiple
variables taking into account the accuracy of measures and outliers (Ibid, 229).
Null hypothesis is that the regression is not significant. Alternative hypothesis is that
the regression is significant.
Large values of R square value indicates that the regression model fits
the data, small values indicate poor explanation (Field, 2009:268). R2=.556 which
proves that the regression model fits the data and the regression line fits the scatter
plot.
If we look at ANOVA test there is a relationship between the two
variables because ( sig. dfference) P<5%. Thus, the results in the coefficients table
are valid and the model is appropriate. SSm is large (SSR smaller) which means that
the model is able to exlain variable’s behaviour better than its mean.
28
H0 here is that the constant does not play a significant role within the
model. Same for the number of visits. P value for both is less than 5% and rejects
the hypothesis. So, we accept the model’s prediction. Respectively, β indicates that
with the increase of 1 mile there is a decrease of number of visits (-.574 number of
visits per 1 mile).
C. Multiple regression analysis
Small values of R square value indicates that the regression model gives
a poor explanation of data that is less likely to fit the data. R2=.011 which is small
and proves that the regression line doesn’t fit the data or the scatter plot (Ibid).
29
With the ANOVA test it is evident that the coefficient table is not reliable
because the P value is bigger than 5% which indicates non-reliability of the
regression model.
β cannot be taken into account because each of P values are higher than
5% which accepts the null hypothesis that age, distance and number of visits don’t
play a significant role within the model. Other independent variable should be
introduced in order to predict customer’s monthly expenditure.
30
Appendices
Appendix 1
Appendix 2
31
32
Works Cited
Bryman A., Cramer D., Quantitative Data Analysis with SPSS 14, 15 & 16: A Guide
for Social Scientists.
Field A., 2009, Discovering Statistics Using SPSS, 3d edition, SAGE Publication Ltd
SAGE publications, 2015, Identifying and Addressing Outliers, Module 5
Available at: http://www.sagepub.com/upm-data/52387_MOD_5.pdf
Accessed on 10/05/2015
Venables, 2015, Quantitative Methods and Data Analysis (i) (MAN00029M) P/G
Module, University of York

More Related Content

What's hot

Overview
OverviewOverview
Overview
Michael Perhats
 
Chap003
Chap003Chap003
Chap003
Sandra Nicks
 
Applied Statistics
Applied StatisticsApplied Statistics
Applied Statistics
Ishtiaq Ishaq
 
Bank competition and financial stability in asia pacific
Bank competition and financial stability in asia pacificBank competition and financial stability in asia pacific
Bank competition and financial stability in asia pacific
Stephan Chang
 
Chapter 11
Chapter 11Chapter 11
Chapter 11
bmcfad01
 
Statistics_Regression_Project
Statistics_Regression_ProjectStatistics_Regression_Project
Statistics_Regression_Project
Alekhya Bhupati
 
Bbs11 ppt ch05
Bbs11 ppt ch05Bbs11 ppt ch05
Bbs11 ppt ch05
Tuul Tuul
 
Chapter 10
Chapter 10Chapter 10
Chapter 10
Tuul Tuul
 
Statistics for management assignment
Statistics for management assignmentStatistics for management assignment
Statistics for management assignment
GIEDEEAM SOLAR and Gajanana Publications, LIC
 
Bbs11 ppt ch06
Bbs11 ppt ch06Bbs11 ppt ch06
Bbs11 ppt ch06
Tuul Tuul
 
Case Study on Placement Solution to AS Business School (Biswadeep Ghosh Hazra...
Case Study on Placement Solution to AS Business School (Biswadeep Ghosh Hazra...Case Study on Placement Solution to AS Business School (Biswadeep Ghosh Hazra...
Case Study on Placement Solution to AS Business School (Biswadeep Ghosh Hazra...
Biswadeep Ghosh Hazra
 
Chapter 01 mis
Chapter 01 misChapter 01 mis
Chapter 01 mis
Rong Mohol
 
ECONOMETRICS PROJECT PG2 2015
ECONOMETRICS PROJECT PG2 2015ECONOMETRICS PROJECT PG2 2015
ECONOMETRICS PROJECT PG2 2015
Sayantan Baidya
 

What's hot (13)

Overview
OverviewOverview
Overview
 
Chap003
Chap003Chap003
Chap003
 
Applied Statistics
Applied StatisticsApplied Statistics
Applied Statistics
 
Bank competition and financial stability in asia pacific
Bank competition and financial stability in asia pacificBank competition and financial stability in asia pacific
Bank competition and financial stability in asia pacific
 
Chapter 11
Chapter 11Chapter 11
Chapter 11
 
Statistics_Regression_Project
Statistics_Regression_ProjectStatistics_Regression_Project
Statistics_Regression_Project
 
Bbs11 ppt ch05
Bbs11 ppt ch05Bbs11 ppt ch05
Bbs11 ppt ch05
 
Chapter 10
Chapter 10Chapter 10
Chapter 10
 
Statistics for management assignment
Statistics for management assignmentStatistics for management assignment
Statistics for management assignment
 
Bbs11 ppt ch06
Bbs11 ppt ch06Bbs11 ppt ch06
Bbs11 ppt ch06
 
Case Study on Placement Solution to AS Business School (Biswadeep Ghosh Hazra...
Case Study on Placement Solution to AS Business School (Biswadeep Ghosh Hazra...Case Study on Placement Solution to AS Business School (Biswadeep Ghosh Hazra...
Case Study on Placement Solution to AS Business School (Biswadeep Ghosh Hazra...
 
Chapter 01 mis
Chapter 01 misChapter 01 mis
Chapter 01 mis
 
ECONOMETRICS PROJECT PG2 2015
ECONOMETRICS PROJECT PG2 2015ECONOMETRICS PROJECT PG2 2015
ECONOMETRICS PROJECT PG2 2015
 

Viewers also liked

Summative last.IBS
Summative last.IBSSummative last.IBS
Summative last.IBS
Kateryna Dedukh
 
MSc Dissertation For print copy
MSc Dissertation For print copyMSc Dissertation For print copy
MSc Dissertation For print copy
Kateryna Dedukh
 
BM summative
BM summativeBM summative
BM summative
Kateryna Dedukh
 
Chacha chaudhary chacha chaudhary ka truck
Chacha chaudhary chacha chaudhary ka truckChacha chaudhary chacha chaudhary ka truck
Chacha chaudhary chacha chaudhary ka truck
83vinod
 
Chacha chaudhary chacha chaudhary ka truck
Chacha chaudhary chacha chaudhary ka truckChacha chaudhary chacha chaudhary ka truck
Chacha chaudhary chacha chaudhary ka truck
83vinod
 
Chacha chaudhary chacha chaudhary ka truck
Chacha chaudhary chacha chaudhary ka truckChacha chaudhary chacha chaudhary ka truck
Chacha chaudhary chacha chaudhary ka truck
83vinod
 
Ziyad_CV
Ziyad_CVZiyad_CV
geoseis
geoseisgeoseis
RH Summit 2015 - Using RH Management Tools In A Hybrid Cloud
RH Summit 2015 - Using RH Management Tools In A Hybrid CloudRH Summit 2015 - Using RH Management Tools In A Hybrid Cloud
RH Summit 2015 - Using RH Management Tools In A Hybrid Cloud
Matthew Mariani
 
Chacha chaudhary chacha chaudhary ka truck
Chacha chaudhary chacha chaudhary ka truckChacha chaudhary chacha chaudhary ka truck
Chacha chaudhary chacha chaudhary ka truck
83vinod
 
KNOWLEDGE, PERCEPTION, UTILISATION OF AND ATTITUDE TOWARDS SOCIAL MEDIA-BASED...
KNOWLEDGE, PERCEPTION, UTILISATION OF AND ATTITUDE TOWARDS SOCIAL MEDIA-BASED...KNOWLEDGE, PERCEPTION, UTILISATION OF AND ATTITUDE TOWARDS SOCIAL MEDIA-BASED...
KNOWLEDGE, PERCEPTION, UTILISATION OF AND ATTITUDE TOWARDS SOCIAL MEDIA-BASED...
Afeez Jinadu
 
Открытие itSMF Украины
Открытие itSMF УкраиныОткрытие itSMF Украины
Открытие itSMF Украины
itSMF Ukraine
 
604 35 project
604 35 project604 35 project
604 35 project
Love Naka
 
Ebook desperte para 2016
Ebook  desperte para 2016Ebook  desperte para 2016
Ebook desperte para 2016
odespertador
 
Thesis Writing Tips for College Students
Thesis Writing Tips for College StudentsThesis Writing Tips for College Students
Thesis Writing Tips for College Students
jayjames12
 

Viewers also liked (17)

Summative last.IBS
Summative last.IBSSummative last.IBS
Summative last.IBS
 
MSc Dissertation For print copy
MSc Dissertation For print copyMSc Dissertation For print copy
MSc Dissertation For print copy
 
BM summative
BM summativeBM summative
BM summative
 
Chacha chaudhary chacha chaudhary ka truck
Chacha chaudhary chacha chaudhary ka truckChacha chaudhary chacha chaudhary ka truck
Chacha chaudhary chacha chaudhary ka truck
 
Chacha chaudhary chacha chaudhary ka truck
Chacha chaudhary chacha chaudhary ka truckChacha chaudhary chacha chaudhary ka truck
Chacha chaudhary chacha chaudhary ka truck
 
Chacha chaudhary chacha chaudhary ka truck
Chacha chaudhary chacha chaudhary ka truckChacha chaudhary chacha chaudhary ka truck
Chacha chaudhary chacha chaudhary ka truck
 
Ziyad_CV
Ziyad_CVZiyad_CV
Ziyad_CV
 
geoseis
geoseisgeoseis
geoseis
 
RH Summit 2015 - Using RH Management Tools In A Hybrid Cloud
RH Summit 2015 - Using RH Management Tools In A Hybrid CloudRH Summit 2015 - Using RH Management Tools In A Hybrid Cloud
RH Summit 2015 - Using RH Management Tools In A Hybrid Cloud
 
Chacha chaudhary chacha chaudhary ka truck
Chacha chaudhary chacha chaudhary ka truckChacha chaudhary chacha chaudhary ka truck
Chacha chaudhary chacha chaudhary ka truck
 
KNOWLEDGE, PERCEPTION, UTILISATION OF AND ATTITUDE TOWARDS SOCIAL MEDIA-BASED...
KNOWLEDGE, PERCEPTION, UTILISATION OF AND ATTITUDE TOWARDS SOCIAL MEDIA-BASED...KNOWLEDGE, PERCEPTION, UTILISATION OF AND ATTITUDE TOWARDS SOCIAL MEDIA-BASED...
KNOWLEDGE, PERCEPTION, UTILISATION OF AND ATTITUDE TOWARDS SOCIAL MEDIA-BASED...
 
Открытие itSMF Украины
Открытие itSMF УкраиныОткрытие itSMF Украины
Открытие itSMF Украины
 
604 35 project
604 35 project604 35 project
604 35 project
 
Ebook desperte para 2016
Ebook  desperte para 2016Ebook  desperte para 2016
Ebook desperte para 2016
 
Thesis Writing Tips for College Students
Thesis Writing Tips for College StudentsThesis Writing Tips for College Students
Thesis Writing Tips for College Students
 
Raspuns ora info publice noiembrie 2013
Raspuns ora info publice noiembrie 2013Raspuns ora info publice noiembrie 2013
Raspuns ora info publice noiembrie 2013
 
Compair Compres
Compair CompresCompair Compres
Compair Compres
 

Similar to Final assesment QRM

BUS308 – Week 1 Lecture 2 Describing Data Expected Out.docx
BUS308 – Week 1 Lecture 2 Describing Data Expected Out.docxBUS308 – Week 1 Lecture 2 Describing Data Expected Out.docx
BUS308 – Week 1 Lecture 2 Describing Data Expected Out.docx
curwenmichaela
 
Between Black and White Population1. Comparing annual percent .docx
Between Black and White Population1. Comparing annual percent .docxBetween Black and White Population1. Comparing annual percent .docx
Between Black and White Population1. Comparing annual percent .docx
jasoninnes20
 
Statistical ProcessesCan descriptive statistical processes b.docx
Statistical ProcessesCan descriptive statistical processes b.docxStatistical ProcessesCan descriptive statistical processes b.docx
Statistical ProcessesCan descriptive statistical processes b.docx
darwinming1
 
Machine learning session1
Machine learning   session1Machine learning   session1
Machine learning session1
Abhimanyu Dwivedi
 
Casual modelling in sociology carmine gelormini
Casual modelling in sociology   carmine gelorminiCasual modelling in sociology   carmine gelormini
Casual modelling in sociology carmine gelormini
CarmineGelormini
 
07 Mesurement and Scaling.pptx
07 Mesurement and Scaling.pptx07 Mesurement and Scaling.pptx
07 Mesurement and Scaling.pptx
MesfinMelese4
 
Analysis of the Propensity to Earn Non-Wage Income in America
Analysis of the Propensity to Earn Non-Wage Income in AmericaAnalysis of the Propensity to Earn Non-Wage Income in America
Analysis of the Propensity to Earn Non-Wage Income in America
Emilio José Calle Celi
 
1 descriptive statistics
1 descriptive statistics1 descriptive statistics
1 descriptive statistics
Sanu Kumar
 
Standard deviation
Standard deviationStandard deviation
Standard deviation
M K
 
SNAPS +PwD in US 2012
SNAPS +PwD in US 2012SNAPS +PwD in US 2012
CHAPTER5    Analyzing Performance MeasuresThink of data as.docx
CHAPTER5    Analyzing Performance MeasuresThink of data as.docxCHAPTER5    Analyzing Performance MeasuresThink of data as.docx
CHAPTER5    Analyzing Performance MeasuresThink of data as.docx
tiffanyd4
 
Lesson 1 07 measures of variation
Lesson 1 07 measures of variationLesson 1 07 measures of variation
Lesson 1 07 measures of variation
Perla Pelicano Corpez
 
Lecture2 Applied Econometrics and Economic Modeling
Lecture2 Applied Econometrics and Economic ModelingLecture2 Applied Econometrics and Economic Modeling
Lecture2 Applied Econometrics and Economic Modeling
stone55
 
lecture 1 applied econometrics and economic modeling
lecture 1 applied econometrics and economic modelinglecture 1 applied econometrics and economic modeling
lecture 1 applied econometrics and economic modeling
stone55
 
Can CEO compensation be justified, at least statistically?
Can CEO compensation be justified, at least statistically?Can CEO compensation be justified, at least statistically?
Can CEO compensation be justified, at least statistically?
Elias Sipunga
 
Can CEO compensation be justified, at least statistically?
Can CEO compensation be justified, at least statistically?Can CEO compensation be justified, at least statistically?
Can CEO compensation be justified, at least statistically?
Elias Sipunga
 
Data Analysis for Graduate Studies Summary
Data Analysis for Graduate Studies SummaryData Analysis for Graduate Studies Summary
Data Analysis for Graduate Studies Summary
KelvinNMhina
 
initial postWhat are the characteristics, uses, advantages, and di.docx
initial postWhat are the characteristics, uses, advantages, and di.docxinitial postWhat are the characteristics, uses, advantages, and di.docx
initial postWhat are the characteristics, uses, advantages, and di.docx
JeniceStuckeyoo
 
EDA_ Bank_Loan_Case_Study_PPT.pdf
EDA_ Bank_Loan_Case_Study_PPT.pdfEDA_ Bank_Loan_Case_Study_PPT.pdf
EDA_ Bank_Loan_Case_Study_PPT.pdf
Sourabhpathak21
 
Chapter 03
Chapter 03Chapter 03
Chapter 03
bmcfad01
 

Similar to Final assesment QRM (20)

BUS308 – Week 1 Lecture 2 Describing Data Expected Out.docx
BUS308 – Week 1 Lecture 2 Describing Data Expected Out.docxBUS308 – Week 1 Lecture 2 Describing Data Expected Out.docx
BUS308 – Week 1 Lecture 2 Describing Data Expected Out.docx
 
Between Black and White Population1. Comparing annual percent .docx
Between Black and White Population1. Comparing annual percent .docxBetween Black and White Population1. Comparing annual percent .docx
Between Black and White Population1. Comparing annual percent .docx
 
Statistical ProcessesCan descriptive statistical processes b.docx
Statistical ProcessesCan descriptive statistical processes b.docxStatistical ProcessesCan descriptive statistical processes b.docx
Statistical ProcessesCan descriptive statistical processes b.docx
 
Machine learning session1
Machine learning   session1Machine learning   session1
Machine learning session1
 
Casual modelling in sociology carmine gelormini
Casual modelling in sociology   carmine gelorminiCasual modelling in sociology   carmine gelormini
Casual modelling in sociology carmine gelormini
 
07 Mesurement and Scaling.pptx
07 Mesurement and Scaling.pptx07 Mesurement and Scaling.pptx
07 Mesurement and Scaling.pptx
 
Analysis of the Propensity to Earn Non-Wage Income in America
Analysis of the Propensity to Earn Non-Wage Income in AmericaAnalysis of the Propensity to Earn Non-Wage Income in America
Analysis of the Propensity to Earn Non-Wage Income in America
 
1 descriptive statistics
1 descriptive statistics1 descriptive statistics
1 descriptive statistics
 
Standard deviation
Standard deviationStandard deviation
Standard deviation
 
SNAPS +PwD in US 2012
SNAPS +PwD in US 2012SNAPS +PwD in US 2012
SNAPS +PwD in US 2012
 
CHAPTER5    Analyzing Performance MeasuresThink of data as.docx
CHAPTER5    Analyzing Performance MeasuresThink of data as.docxCHAPTER5    Analyzing Performance MeasuresThink of data as.docx
CHAPTER5    Analyzing Performance MeasuresThink of data as.docx
 
Lesson 1 07 measures of variation
Lesson 1 07 measures of variationLesson 1 07 measures of variation
Lesson 1 07 measures of variation
 
Lecture2 Applied Econometrics and Economic Modeling
Lecture2 Applied Econometrics and Economic ModelingLecture2 Applied Econometrics and Economic Modeling
Lecture2 Applied Econometrics and Economic Modeling
 
lecture 1 applied econometrics and economic modeling
lecture 1 applied econometrics and economic modelinglecture 1 applied econometrics and economic modeling
lecture 1 applied econometrics and economic modeling
 
Can CEO compensation be justified, at least statistically?
Can CEO compensation be justified, at least statistically?Can CEO compensation be justified, at least statistically?
Can CEO compensation be justified, at least statistically?
 
Can CEO compensation be justified, at least statistically?
Can CEO compensation be justified, at least statistically?Can CEO compensation be justified, at least statistically?
Can CEO compensation be justified, at least statistically?
 
Data Analysis for Graduate Studies Summary
Data Analysis for Graduate Studies SummaryData Analysis for Graduate Studies Summary
Data Analysis for Graduate Studies Summary
 
initial postWhat are the characteristics, uses, advantages, and di.docx
initial postWhat are the characteristics, uses, advantages, and di.docxinitial postWhat are the characteristics, uses, advantages, and di.docx
initial postWhat are the characteristics, uses, advantages, and di.docx
 
EDA_ Bank_Loan_Case_Study_PPT.pdf
EDA_ Bank_Loan_Case_Study_PPT.pdfEDA_ Bank_Loan_Case_Study_PPT.pdf
EDA_ Bank_Loan_Case_Study_PPT.pdf
 
Chapter 03
Chapter 03Chapter 03
Chapter 03
 

Final assesment QRM

  • 1. 1 STUDENT EXAMINATION NUMBER Y1401956 MODULE NO: MAN00029M MODULE TITLE: Quantitative Methods & Data Analysis Module Tutor: Dr. Harry Venables Essay Title: Final assessment Word Count: ___2688_________
  • 2. 2 Task 1 In order to start performing any manipulation with data the Data View and Variable View in SPSS should comply with the rules so that SPSS output would compute properly. Name Label Values Type New type Rationale Obs Id number None Nominal Scale Variable items that are not measurable but are numeric like ID numbers and phone numbers (can also be Nominal). Gender Gender 0=Female, 1=Male Nominal Nominal Variable items are all numbers that represent categories and have no order to them, e.g. 1-Blue Car, 2-Cat, 0-Male, 11- Female, etc. Age Age(years) None Nominal Scale Variable items are all measurable numbers e.g. height in cm. Status Marital Status 1=Single, 2=Married,3= Divorced, 4=Widowed Nominal Nominal Variable items are all numbers that represent categories and have no order to them, e.g. 1-Blue
  • 3. 3 Car, 2-Cat, 0-Male, 11- Female, etc. Occupa tion Occupation 1=Student,2 =Employed, 3=Self- employed, 4=Retired Nominal Nominal Variable items are all numbers that represent categories and have no order to them, e.g. 1-Blue Car, 2-Cat, 0-Male, 11- Female, etc. AvgMon thlySpe nding AverageMonthly Spend (GBP) None Nominal Scale (custom currency ) Variable items are all measurable monetary values. Monthly Visits Number of Monthly visits None Nominal Scale Variable items are all measurable numbers e.g. height in cm. Distanc e Distance Travelled (miles) None Nominal Scale Variable items are all measurable numbers e.g. height in cm. Car Vehicle Ownership 0=No,1=Yes Nominal Nominal Variable items are all numbers that represent categories and have no order to them, e.g. 1-Blue Car, 2-Cat, 0-Male, 11- Female, etc. Appreci Customer 1=Very Low, Nominal Ordinal Variable items are
  • 4. 4 ation Appreciation 2=Low, 3=Indifferent, 4=High, 5=Very High. numbers that represent some form of ranking or order, e.g. Likert scale values 1-5, 1-7. Task 2 A. The bar chart indicates the target consumers of the FreshCo retail centre and consumer’s two characteristics are analysed: status and occupation. So, cross- tabulation (Table 2.1) is used in order to analyse two variables and produce an appropriate bar chart. Table 2.1 Table 2.2 Table 2.2 shows that there are 201 repondents and 2 modes.
  • 5. 5 From this chart a conclusion could be drawn that the majority of FreshCo’s cosnumers are employed (116 out of 201 repsondents) and married ( 89 out of 201 reposndents) . Thus, Bimodal attribute is married and employed because of occuring most frequently ( appendix 2)( Field,2009:21). B. The target-consumer analysis contains previous charachteristics such as status and occupation and the differeneces between them but in regard to car owership.
  • 6. 6
  • 7. 7 Divorced that are either employed or self-employed and widowed people that are retired are groups that don’t own a car. Divorced people,especially self- employed sub-group is the largests group that doesn’t own a car. Employed and married on the contrary is the largest group to own a car. Single students are the second largest and single employed is the smallest group to own a car. C. Considering the fact the majority of FreshCo’s cutomers are married employed car owners potential issues such as enough numbers of parking slots could arise. Also, the retial’s convenient opening hours could make a significant difference for working idividuals. Marrital status can also indicate the presence of children and need for children facilities such as playgrounds and food courts on the site.
  • 8. 8 Task 3 A. Consumer spending in regard to consumer charachteristics. Extreme values (outliers) occur for student males and self - employed females. Outliers are the extreme values that deviate from the rest of the responses. In this case three respondents have outstanding answers on the average monthly spending. The numbers over outliners indicate the row - number of the respondent (SAGE, 2015). In the self-employed female group one person has higher monthly spendings (522.59 $) than the rest. In the male student group two respondents spend more than the rest of the group (219.84 $ and 225.84 $). Medians are dispersed in terms of occupation. Whereas, in terms of gender medians are not significantly different (they overlap). Both employed and self-
  • 9. 9 employed males and females spend more money than other groups( the difference is significant because their confidence intervals don’t overlap). There is also some difference between the employed and self-employed group because boxes of these groups almost don’t overlap (with employed and self-employed women there is less difference in spending because the boxes slightly overlap). Lower median position shows lower spending for the self-employed group than employed group that has higher median position. The interquartile ranges are of slightly different length and have different positions which indicates different dispersion of data between the two groups (self-employed group is smaller) (Field, 2009:100-2). There are no significant differences between expenditure of students and retired individuals because their boxes overlap. However, the spread of student interquartile range differs across gender. The male group is smaller and less likely to spend more money than women (Ibid). B. Average monthly expenditure according to level of appreciation.
  • 10. 10 The average expenditure for customers with ‘very high’ customer appreciation differs significantly from the rest because the median and the box (incl. confidence intervals) are far away from the rest and don’t overlap. Interquartile range of the ‘very low’ and ‘low’ appreciation is very different from all the rest which indicates a wider dispersion of data (Field, 2009:101-2). Some box plots show the skewness of data and lack of symmetry which needs to be observed more closely through a statistical test.
  • 11. 11
  • 12. 12 Descriptive test shows the means as well as medians of ‘low’, ‘very low’, ‘indifferent’ and ‘high’ appreciation are not significantly different from each other. However, the mean (as well as median- 199.9549) of ‘very high’ appreciation is significantly lower (197.8941) and differs from the rest. Standard deviation from the mean also differs for ‘very low’ and ‘low’ which numerically shows a wider variety of indicators deviating from the mean. The interquartile range of these both groups also significantly differs from the rest. We can also observe slight positive skewness for ‘high’ and ‘indifferent’ groups and slight negative skewness for the rest that indicates slight asymmetry of the data distribution (Field, 2009:19). Customers that tend to spend the least amount have the highest customer appreciation. Customers that spend the most are indifferent or have low or very low appreciation. Task 4: Distribution of customer’s monthly expenditure.
  • 13. 13 According to the histogram the data for average monthly spending is not normally distributed. We observe a flat distribution with a negative skew. The bars are out from the normal curve and have an obvious split in two. In a Normal Q-Q Plot we see some deviation towards the tail. Normal QQ-Plot is a chart of the observed values plotted against normalized expected values. The data values are pretty far away from the line and even cross it which shows that distribution is not normal. The spending data only around 180.000 spending value and 5300.00 spending value is normally distributed. Generally, values don’t follow the normal distribution. Detrended
  • 14. 14 plot is another view of the first that detrends the line. It shows even more closely the abnormality of distribution. The box plot doesn’t show any outliers. Central section of the data is not centrally distributed because the median is not centrally placed. Distribution of distance travelled The Distance travelled data seems more normally distributed. However, if we see the Normal Q-Q Plot than we can see slight deviation towards the end of the tail. Detrended Normal Q-Q plot shows a closer look which reveals that the data is not as
  • 15. 15 normally distributed as it looks like. The box plot has three outliers. Median is almost centrally placed, so, central section of data is almost centrally distributed. The normality of distribution is hard to indicate without carrying out the test of normality. From the table we can spot skewness which indicates abnormal distribution in both cases. In the first case more (-.900) than in the other (.612).
  • 16. 16 According to Shapiro-Wilk test (which is more reliable), Sig. (p < than 0.05) shows that the data in both average monthly spending and distance travelled is not normally distributed. The null hypothesis here is that the data that is given has no difference from that of the normal distribution. The hypothesis test rejects it. Significance p-value is less than level of significance. In this case .000 and .001 are smaller than 5% (0.05); therefore, the null hypothesis is rejected and we conclude that the data is not normally distributed and that Distance travelled has less of normality deviation than the monthly spending data. Task 5: Significance of age-gender difference. a. It is assumed that the data is normally distributed, which suggests a parametric test in a form of a T-test. T-test is used when there are “two experimental conditions and different participants used in each condition” (Field, 2009:334). The null hypothesis (H0) is that there is no significant differences between the age and gender variables. Alternative hypothesis (H1) would be that there is a significant differences between same variables. P-value indicates level of probability at which we accept or reject the hypothesis (Ibid). P value has to be linked to the direction of hypothesis we are testing. If probability p < 0.05 (5%) it means that the H0 is rejected and the variances have significant difference. If p> 0.05 (5%) the H0 is not rejected. After the analysis of variances the second step involves the analysis of means. If the previous test doesn’t show significant differences and we do not reject H0 of the previous test then we should look at the first row of the Independent Sample Test. B.
  • 17. 17 There was 126 female respondents and 75 male respondents. According to group statistics males have higher age average (41.53 years old) than women (37.65). In order to carry out the analysis of the test and see if variances are different in different groups we should look at the Levene’s test for the p-value (Sig.). In this case p= .248 > 0.05 (5%); so, we accept (or rather not reject) the null hypothesis (H0). Thus, there is no significant difference between the variances of the groups. Accordingly, we look at the first row (Equal variances assumed) of the T-test for equality means. Second row is disregarded (Field, 2009:340). P=.000 < 5% (Sig. 2-tailed); so, we reject the null hypothesis for the mean variable. This means that there is a difference in the mean between the groups, so, we have to look at the mean differences row. To conclude, there is significant difference in the mean but not in the variance. Significance measure shows that there is a difference in average age for different sexes. The mean difference is negative which means group 2 (males) is the largest group.
  • 18. 18 The normality test was also carried out to support the T-test and reveal detailed data on age average across sexes and differences between these averages. The test below supports the rationale behind choosing the parametric test over non-parametric test due to normality of distribution.
  • 19. 19 The normality table supports the assumption that the data is normally distributed (and that the T-test is appropriate). H0 is that data has no significant difference from normal distribution. P values for both males and females are bigger than 5%. (p=.511> 0.05 and p=.135>0.05) which ensures the normality of distribution. Charts below also support the perfect normality of distribution which means that T-test was used correctly.
  • 20. 20
  • 21. 21 Task 6 The task investigates the customers feedback connected to the customer’s level of appreciation. It also compares the level of appreciation across different genders of consumers. Appropriate test of association would be Chi-square test ( for two or more samples) that is used when one group is dependent on the other in order to measure relationship between the attribute variable (investigates relationship among attribute variables, usually nominal and ordinal variables that can be grouped or ranked) (Venables,2015,w3 p3).
  • 22. 22 Firstly, because we have two unrelated samples we need to make a Crosstabs table and indicate the null hypothesis (H0) and an alternative hypothesis (H1). H0 - would be that customer appreciation does not depend on gender (gender influences on customer appreciation level). H1 - would be that customer appreciation depends on gender. Count or observed frequency are results from variable groups. Expected count or frequency is calculated in the table by using row and columns totals. Expected frequencies in each cell have to be higher than 5 to avoid misleading results, so there would be no issues in the count (Bryman, Cramer,2009:155). In the table above standardised risiduals are within +/- 3 gap which shows reliability of the test and its normality. Accroding to the table, however, it is hard to tell the customer appreciation level depending on gender because the number of female reposndents is higher (126 total) than of male respondents (75 total). So, the dependancy is not evident without the Chi-square.
  • 23. 23 Looking at the Pearson Chi-square test P= .505 > 0.05 (5%), so, we do not reject the H0 and conclude that customer appreciation does not depend on gender. These two groups are independent of one another. Task 7: Customer behaviour A. Measuring variables against each other.
  • 24. 24 Correlation indicates the direction and strenght of the reltionship between variables. It shows interdependence of variables and observes direct, null and inverse relationships.Each point represents respondents position in relation to the two varibales being measured (Bryman, cramer, 2009: 212). In this matrix plot we can see the majority of scattered patterns with random distribution and some weak form of correlation except one case with an obvious inverse curvilinear and negative relationship (Bryman, cramer,2009:215). The diagonal with no data are values against themselves which indicates perfect correlation (where p=0). If we look at the lower triangle ( which mirros the upper triangle) we can see potentially strong correlation between Distance Travelled and Number of Monthly Visits data because the scatter is very close. The rest of data has
  • 25. 25 random patterns and distribution without any direction which indicates weak correlation or lack of such. To conclude, from SPSS test we can interpret that with the decrease of distance travelled there is an increase of monthly visits. B. Before applying the Pearson’ Correlation test we should make sure that the graph is linear because according to the scattered matrix plot the two variables (Monthly Visits and Distance Travelled) have a curvilinear relationship (shape of the relationship is not straight and curves at some point), so it is non- linear; thus, “it is not appropriate to apply a measure of linear correlation like Pearson’s r” (Bryman, Crymer, 2009: 214). In order to use Pearson test the correlation should be linear and the two selected variables should be normally distributed. Firstly, we need to transform an independent variable into a logarithmic scale to perform a valid Pearson correlation test (Ibid) and test the assumption of normal distribution. Otherwise, the outcome would be insignificant and could show errors. Testing normality (appendix 2) According to Schapiro-Wilk test of normality where p < 0.05 shows the data on both average distance travelled (Sig. = .001) and number of monthly visits (Sig. = .000) is not normally distributed. This rejects the null hypothesis
  • 26. 26 which states that the data that is given has no difference from that of the normal distribution. In this case two variables are not linear or normally distributed. Despite the adjustment of transformation of logarithmic scale the test might not provide a meaningful outcome. Performing linear correlation In order to measure correlation we have to explore covariance that indicates how variables vary together. Pearson’s correlation coefficient (P) describes covariance. If P=1, then it means that there is absolute positive correlation between the two variables x and y. If P<0 then, there is a direct relationship between the variables x and y. If P=0 then there is no direct relationship between variables; and if P<0 then there is an inverse relationship present between the variables. We use a Pearson test also because it’s a continuous data (Venables, 2015, l4, p2). As we can observe in the table by looking at P there is an inverse relationship between the two variables since P<0. They also have a strong negative relationship because p= - .871 (close to -1) (Bryman, Cramer, 2009:217).
  • 27. 27 Regressions Regression analyzes the cause-effect relationship between multiple variables taking into account the accuracy of measures and outliers (Ibid, 229). Null hypothesis is that the regression is not significant. Alternative hypothesis is that the regression is significant. Large values of R square value indicates that the regression model fits the data, small values indicate poor explanation (Field, 2009:268). R2=.556 which proves that the regression model fits the data and the regression line fits the scatter plot. If we look at ANOVA test there is a relationship between the two variables because ( sig. dfference) P<5%. Thus, the results in the coefficients table are valid and the model is appropriate. SSm is large (SSR smaller) which means that the model is able to exlain variable’s behaviour better than its mean.
  • 28. 28 H0 here is that the constant does not play a significant role within the model. Same for the number of visits. P value for both is less than 5% and rejects the hypothesis. So, we accept the model’s prediction. Respectively, β indicates that with the increase of 1 mile there is a decrease of number of visits (-.574 number of visits per 1 mile). C. Multiple regression analysis Small values of R square value indicates that the regression model gives a poor explanation of data that is less likely to fit the data. R2=.011 which is small and proves that the regression line doesn’t fit the data or the scatter plot (Ibid).
  • 29. 29 With the ANOVA test it is evident that the coefficient table is not reliable because the P value is bigger than 5% which indicates non-reliability of the regression model. β cannot be taken into account because each of P values are higher than 5% which accepts the null hypothesis that age, distance and number of visits don’t play a significant role within the model. Other independent variable should be introduced in order to predict customer’s monthly expenditure.
  • 31. 31
  • 32. 32 Works Cited Bryman A., Cramer D., Quantitative Data Analysis with SPSS 14, 15 & 16: A Guide for Social Scientists. Field A., 2009, Discovering Statistics Using SPSS, 3d edition, SAGE Publication Ltd SAGE publications, 2015, Identifying and Addressing Outliers, Module 5 Available at: http://www.sagepub.com/upm-data/52387_MOD_5.pdf Accessed on 10/05/2015 Venables, 2015, Quantitative Methods and Data Analysis (i) (MAN00029M) P/G Module, University of York