Business Statistics assignment 2014

MSc Marketing and Business Analysis
Marketing Statistics
17/11/2014
B064536

1
Table of contents
Section number and title page
1. Description of secondary school student alcohol consumption dataset 2
2. Descriptive and summary statistics 3
3. One-tailed test about a population proportion 7
4. Chi-square test of association 9
5. Correlation 11
6. Further analysis 12
7. Limitation 12
8. Conclusion 12
List of references 13
Appendices
Appendix 1. Survey questions 14
Appendix 2. Cross tab 16
List of tables
Table 1. Variables used 3
Table 2. Summary statistics 3
Table 3. Gender frequency 4
Table 4. School year 5
Table 5. Ever had a proper alcoholic drink 6
Table 6. Chi-square test 9
Table 7. Symmetric measures 9
Table 8. Correlation between family attitude and drinking frequency 11
List of figures
Figure 1. Sample gender in percentage 4
Figure 2. School year in percentage 5
Figure 3. Ever had a proper alcoholic drink 6

2
1. Description of secondary school student alcohol consumption dataset
The secondary data was obtained from UK Data Service (2014). The dataset was collected
through a survey conducted by National centre for Social Research (2012) on secondary school
pupils (aged 11 to 15) . The survey (see Appendix 1) aim to gain insight on the number of
youth alcohol drinkers and their drinking behaviour. A total of 7589 valid responses were
gathered.
The key reason for selecting this dataset is to gain insight on student drinkers so as to develop
effective strategies to curb underage drinking. A body of evidence suggest that drinking at a
young age, in particular heavy and regular drinking, can result in physical or mental problems
and put childern at risk of alcohol related accident or injury. More broadly it is also associated
with missing or falling behind at school, violent and antisocial behaviour. It is therfore
necessary to develop strategies to tackle problem drinking at both national and local level.
This report will firstly provide descriptive and summary statistic about the sample. Next, it will
conduct one-tailed population proportion hypothesis test to investigate the proportion of UK
pupils who drank alcohol before. This is followed by Chi-square test to ascertain if peer
pressure and student alcohol consumption frequence are associated. Following next, correlation
analysis will be conducted to investigate the strength of type of relationship between family
attitude and student drinking frequency. The report will also mention on further analysis and
limitation of dataset.

3
2. Descriptive and Summary Statistics
Table 1. Variables used
Variable name Measurment Analysis conducted
Age Ratio Summary statistics
Gender Nominal Summary statistics
School year Ordinal Summary statistics
Unit of alcohol drank in last 7
days
Ratio Summary statistics
Ever had a proper alcoholic
drink
Nominal Summary statistic and
hypothesis test
Peer pressure Nominal Chi-square
Family attitude to pupil
drinking
Interval Correlation
Monthly usual drinking
frequency
Interval Summary statistic, Chi-square,
correlation
Table 2. Summary Statistics
Age 11-15 Units of
alcohol drank
in last 7 days
Usual
drinking
frequency
(monthly)
N
Valid 7589 7172 7314
Missing 0 417 275
Mean 13.1735 1.4194 6.7829
Median 13.0000 1.0000 8.0000
Mode 15.00 1.00 8.00
Std. Deviation 1.39074 1.47192 1.68427
Minimum 11.00 1.00 1.00
Maximum 15.00 8.00 8.00
From Table 2, it can be observed that the mean age of the sample is 13 years old. As for students
who drank alcohol before, their mean consumption was 1.4 units. Besides that, the student’s
mean monthly drinking frequency is 6.78 times.
Furthermore, the three variables analysed in Table 2, has sample standard deviation of 1.684
(Usual drinking frequency), 1.472 (Units of alcohol drank) and 1.391 (Age) respectively. This
shows that there is little variability in each variable analysed. Sample standard deviation is
calculated by using the formula: s =
 
1
2


n
xxi
Next, some sample characteristics will be presented using frequency tables and charts.

4
Table 3. Gender Frequency
Frequency Percent Valid Percent Cumulative
Percent
Valid
Boy 3809 50.2 50.2 50.2
Girl 3780 49.8 49.8 100.0
Total 7589 100.0 100.0
Figure 1. Sample Gender in Percentage
From Table 3, in the sample of 7589 respondents, 50.2% are boy (3809) and 49.8% (3780) are
girl. Figure 1 displayed the gender percentage.

5
Table 4. School Year
Percent
Valid
Year 7 1481 19.5 19.5 19.5
Year 8 1526 20.1 20.1 39.6
Year 9 1580 20.8 20.8 60.4
Year 10 1553 20.5 20.5 80.9
Year 11 1449 19.1 19.1 100.0
Total 7589 100.0 100.0
Figure 2. School Year in Percentage
From Table 4, majority of respondents come from year 9 (20.8%), followed by year 10 (20.5%),
year 8 (20.1%), year 7 (19.5%) and lastly year 11 (19.1%). Figure 2, clearly display the
respondent’s school year in percentage.
18
18.5
19
19.5
20
20.5
21
Year 7 Year 8 Year 9 Year 10 Year 11
Percentage
School Year
SchoolYear in Percentage

6
Table 5. Ever had a proper alcoholic drink
Percent
Valid
Yes 3222 42.5 43.1 43.1
No 4256 56.1 56.9 100.0
Total 7478 98.5 100.0
Missing Not answered 111 1.5
Total 7589 100.0
Figure 3. Ever had a proper alcoholic drink
From Table 5, it can be observed that 43% of respondents have had a proper alcoholic drink
before. On the other hand, 57% did not had a proper alcoholic drink before.
Next, one-tailed test about population proportion will be conducted.

7
3. One-Tailed Test About a Population Proportion (Hypothesis Test)
Rationale for conducting one-tailed test about a population proportion:
National Statistic (2013) estimated that 45% of UK pupils (age 11 to 15) had drunk alcohol at
least once. However, according to the data used in this report, it showed that in a valid sample
of 7478 UK students, 3222 pupils had drunk alcohol at least once. It is therefore in the interest
of the researcher to investigate whether the population porportion is really 45% or is it lower
as presented in the data used.
H0: π = 0.45
H1: π < 0.45
Level of significance: 0.05
Test statistic is calculated using the following formula:
Z =



0
Where:
Population standard deviation =
 
n
00 1 



Assuming n ≥ 5 and n (1- ) ≥ 5
Checking assumption:
7478×0.45= 3365.1 ≥ 5, 7478×0.55= 4112.9 ≥ 5, therefore assumption holds and the
researcher proceed to calculate test statistic.
Test statistic calculation:
 
n
00 1 


 =
  00575.0
7478
55.045.0

Z =



0
= 30.3
00575.0
45.0431.0


Using critical value approach:
At 5% significance level, crtitical value = - 1.645
Z = -3.30 < - 1.645, therefore reject H0.

8
Checking using p-value approach:
From standard normal cumulative proability table, z = -3.30, p-value = 0.0005
p-value = 0.0005 < 0.05, therefore both approach are consistent, reject H0.
There is sufficient evidence to reject H0 as p-value = 0.0005 < 0.05 and therefore accept H1.
The reseracher conclude that the porportion of UK pupils who had drunk alcohol at least once
are less than 45%, at 95% confidence level.
Next, Chi-square test of association will be conducted.

9
4. Chi-square test of association
Rationale for conducting Chi-square test:
Borsari and Carery (2001) claimed that excessive drinking is associated with peer pressure
among university students. It is therefore in the interest of the researcher to test this claim
among young pupils. In addition, peer pressure is a nominal variable and student drinking
frequency is an interval variable. Therefore, Chi-square test is most appropriate.
H0: Peer pressure and student drinking frequency are independent
H1: Peer pressure and student drinking frequency are dependent
Level of significan: 0.05
Test statistic is calculated using the following formula:
Calculate expected table (see appendix 2) using
n
CR
e ji
ij

ˆ
Pearson Chi-square statistic (see Table 6) using
 
 


r
i
c
j ij
ijij
e
eo
x
1 1
2
2
ˆ
ˆ
Table 6. Chi-Square Tests
Value df Asymp. Sig. (2-
sided)
Pearson Chi-Square 155.105 7 .000
Likelihood Ratio 152.157 7 .000
Linear-by-Linear
Association
130.174 1 .000
N of Valid Cases 6981
Compare with x 2
(r-1)(c-1),alpha = x 2
(2-1)(8-1),0.05 = x 2
7, 0.05 =14.067
Since Chi-square value = 155.105 > 14.067, there is sufficient evidence to reject H0. There is
association between peer pressure and student drinking frequency, at 95% confidence level.
Chi-square only test whether the relationship exists, therefore, the researcher uses Contingency
coefficient, Cramer’s V and Phi coefficient to measure strength of association.
Table 7. Symmetric Measures
Value Approx.
Sig.
Nominal by
Nominal
Phi .149 .000
Cramer's V .149 .000
Contingency
Coefficient
.147 .000
N of Valid Cases 6981

10
Phi coefficient (Table 7) is calculated using the formula:
𝜑 =
1010 CCRR
bcad 
= 0.149, Phi can take the value of [-1,1]. Phi=0.149 indicates a weak
positive association. The significance value of 0.000 means Phi value is significant.
Cramer’s V (Table 7) is calculated using the formula:
V =
   1,1min
2
 cr
n
x
= 0.149, Cramer’s V takes value between 0 and 1. V=0.149 indicates
a weak association. The significance value of 0.000 means Cramer’s V is significant.
Contingency coefficient (Table 7) is calculated using the formula:
C =
nx
x
2
2
= 0.147, Contingency coefficient takes value between 0 and 1. C=0.147 indicates
a weak association. The significance value of 0.000 means Contingency coefficient is
significant.
Therefore, it can be concluded that there is association between peer pressure and student
drinking frequency. However, the strength of association is not strong.
Next, correlation analysis will be conducted.

11
5. Correlation
Rationale for conducting correlation analysis:
National Statistic (2012) reported that family attitude and student drinking frequency are
associated. The researcher is interested in investigating the strength of type of relationship
using correlation. Both variables are in scale measurement, it is therefore suitable to conduct
correlation analysis.
Key theory of correlation:
Correlation is a measure of linear association and does not necessarily indicate causation. The
correlation coefficient can take on values between -1 and +1. Values near -1 indicate a strong
negative linear relationship. Values near +1 indicate a strong positive linear relationship.
Table 8. Correlation between family attitude and drinking frequency
Family
attitudes to
pupil drinking
Usual
drinking
frequency
Family attitudes to pupil
drinking
Pearson Correlation 1 .553**
Sig. (2-tailed) .000
N 7183 7147
Usual drinking frequency
Pearson Correlation .553**
1
Sig. (2-tailed) .000
N 7147 7314
**. Correlation is significant at the 0.01 level (2-tailed).
From Table 8, it can be observed that parent’s attitude to pupil drinking and usual drinking
frequency have a moderate positive linear correlation of 0.553 at 0.01 significance level. It can
be explained that student drinking frequency is related to parent’s attitude. Family members
who disapproves student drinking tends to be related to lower drinking frequency. Conversely,
parents who does not mind student drinking tends to be related to higher drinking frequency.
The Pearson Correlation is calculated using the following formula:
r =
yx
xy
ss
s
=
  
 
 
 
 
 11
1
22







 
n
yy
n
xx
n
yyxx
ii
ii
=
  
   



22
yyxx
yyxx
ii
ii
= 0.553
where:
sxy = covariance (measure of the of the linear association between two variables)
sx = standard deviation of x
Sy = standard deviation of y

12
6. Further Analysis
The researcher would like to conduct multiple regression analysis on the dataset to model the
form of the relationship on pupil drinking behaviour (dependent) and other independent
variables. Besides that, factor analysis could be applied to identify and confirm the
dimensionality of existing scales. Furthermore, the researcher would also like to conduct
cluster analysis on the dataset to segment student drinkers base on different characteristics.
These further analysis allows researchers to gain insight on student drinkers so that effective
actions could be taken to curb alcohol consumption among young pupils.
7. Limitation
The use of stratified sampling in this research could lead to sampling bias as stratas are difficult
to identify. In addition, in this research, the stratas were divided according to school type
(comprehensive, secondary modern, grammar and private). One key assumption of stratified
sampling is that the stratas are homogenous. However, there is a possibility that the stratas are
heterogenous. For example, in private schools there are single and mixed gender schools, also
there are international schools.
Another limitation of this research was questionnaire administartion. Students were given
paper copy of the questionnaire and were asked to complete the questionnaire within 60
minutes, under exam condition with teacher supervision. The use of paper questionnaire leads
to many missing values as students did not answer all questions. Besides that, the duration of
questionnaire was too long, students might lose interest and not complete the questionnaire.
Lastly, the presence of teacher supervision might pressure students to provide socially desirable
answers.
Future researcher could conduct computer adminstrated questionnaires with skip logic and
compulsary questions. This would reduce the number of missing values. The duration of the
questionnaire could be shortened to around 20 minutes to prevent students from losing interest.
Lastly, there would be no teacher supervision to avoid any pschological pressure on students.
8. Conclusion
In conclusion, this report has presented descriptive and summary statistic about the sample. It
also conducted hypothesis test on population proportion of UK pupils who drank alcohol at
least one. Chi-square test was also conducted to ascertain if peer pressure and drinking
frequency were associated. Correlation test was conducted to measure the strength of type of
relationship between family attitude and drinking frequency. Lastly, this report also mentioned
on further analysis and limitation of dataset.

14
Appendix 1. Survey questions
Are you a boy or a girl?
Boy
Girl
Which year are you at school?
Year 7
Year 8
Year 9
Year 10
Year 11
How old are you now?
_______Years old
Have you ever had a proper alcoholic drink?
Yes
No
How often do you usually have an alcoholic dink in a month?
0-3 times
4-7 times
8-11 times
12-15 times
16-19 times
20-23 times
24-27 times (7)
28-31 times (8)
How do your parents/guardian feel about you drinking alcohol?
They won’t like me to drink alcohol at all (1)
They don’t like but allow me to drink limited amount
They won’t mind as long as I don’t drink too much
They would let me drink as much as I like

15
Write down the number of pints, half pints, large and small cans or bottles of alcohol that you
have consumed in the past 7 day?
_____Pints
_____Half pints
_____Large can
_____Smallcan
_____bottle
I drink due to peer pressure
Yes
No

16
Appendix 2. Cross Tab
Case Processing Summary
Cases
Valid Missing Total
N Percent N Percent N Percent
People my age drink
because ofpressure from
friends * (D) Usual drinking
frequency (8 cat)
6981 92.0% 608 8.0% 7589 100.0%

People my age drink because of pressure from friends * (D) Usual drinking frequency (8 cat) Crosstabulation
(D) Usual drinking frequency (8 cat) Total
Almost
every day
About twice
a week
About once
a week
About once a
fortnight
About once
a month
A few times
a year
Never
drinks now
Never had
a drink
People my age drink because
of pressure from friends
True
Count 10 64 96 199 267 843 133 2599 4211
Expected
Count
10.3 103.8 146.6 250.3 313.1 822.2 121.2 2443.6 4211.0
False
Count 7 108 147 216 252 520 68 1452 2770
Expected
Count
6.7 68.2 96.4 164.7 205.9 540.8 79.8 1607.4 2770.0
Total
Count 17 172 243 415 519 1363 201 4051 6981
Expected
Count
17.0 172.0 243.0 415.0 519.0 1363.0 201.0 4051.0 6981.0

Business Statistics assignment 2014

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (6)

Similar to Business Statistics assignment 2014

Similar to Business Statistics assignment 2014 (20)

Business Statistics assignment 2014