Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Business Statistics assignment 2014

1,349 views

Published on

  • Be the first to comment

  • Be the first to like this

Business Statistics assignment 2014

  1. 1. MSc Marketing and Business Analysis Marketing Statistics 17/11/2014 B064536
  2. 2. 1 Table of contents Section number and title page 1. Description of secondary school student alcohol consumption dataset 2 2. Descriptive and summary statistics 3 3. One-tailed test about a population proportion 7 4. Chi-square test of association 9 5. Correlation 11 6. Further analysis 12 7. Limitation 12 8. Conclusion 12 List of references 13 Appendices Appendix 1. Survey questions 14 Appendix 2. Cross tab 16 List of tables Table 1. Variables used 3 Table 2. Summary statistics 3 Table 3. Gender frequency 4 Table 4. School year 5 Table 5. Ever had a proper alcoholic drink 6 Table 6. Chi-square test 9 Table 7. Symmetric measures 9 Table 8. Correlation between family attitude and drinking frequency 11 List of figures Figure 1. Sample gender in percentage 4 Figure 2. School year in percentage 5 Figure 3. Ever had a proper alcoholic drink 6
  3. 3. 2 1. Description of secondary school student alcohol consumption dataset The secondary data was obtained from UK Data Service (2014). The dataset was collected through a survey conducted by National centre for Social Research (2012) on secondary school pupils (aged 11 to 15) . The survey (see Appendix 1) aim to gain insight on the number of youth alcohol drinkers and their drinking behaviour. A total of 7589 valid responses were gathered. The key reason for selecting this dataset is to gain insight on student drinkers so as to develop effective strategies to curb underage drinking. A body of evidence suggest that drinking at a young age, in particular heavy and regular drinking, can result in physical or mental problems and put childern at risk of alcohol related accident or injury. More broadly it is also associated with missing or falling behind at school, violent and antisocial behaviour. It is therfore necessary to develop strategies to tackle problem drinking at both national and local level. This report will firstly provide descriptive and summary statistic about the sample. Next, it will conduct one-tailed population proportion hypothesis test to investigate the proportion of UK pupils who drank alcohol before. This is followed by Chi-square test to ascertain if peer pressure and student alcohol consumption frequence are associated. Following next, correlation analysis will be conducted to investigate the strength of type of relationship between family attitude and student drinking frequency. The report will also mention on further analysis and limitation of dataset.
  4. 4. 3 2. Descriptive and Summary Statistics Table 1. Variables used Variable name Measurment Analysis conducted Age Ratio Summary statistics Gender Nominal Summary statistics School year Ordinal Summary statistics Unit of alcohol drank in last 7 days Ratio Summary statistics Ever had a proper alcoholic drink Nominal Summary statistic and hypothesis test Peer pressure Nominal Chi-square Family attitude to pupil drinking Interval Correlation Monthly usual drinking frequency Interval Summary statistic, Chi-square, correlation Table 2. Summary Statistics Age 11-15 Units of alcohol drank in last 7 days Usual drinking frequency (monthly) N Valid 7589 7172 7314 Missing 0 417 275 Mean 13.1735 1.4194 6.7829 Median 13.0000 1.0000 8.0000 Mode 15.00 1.00 8.00 Std. Deviation 1.39074 1.47192 1.68427 Minimum 11.00 1.00 1.00 Maximum 15.00 8.00 8.00 From Table 2, it can be observed that the mean age of the sample is 13 years old. As for students who drank alcohol before, their mean consumption was 1.4 units. Besides that, the student’s mean monthly drinking frequency is 6.78 times. Furthermore, the three variables analysed in Table 2, has sample standard deviation of 1.684 (Usual drinking frequency), 1.472 (Units of alcohol drank) and 1.391 (Age) respectively. This shows that there is little variability in each variable analysed. Sample standard deviation is calculated by using the formula: s =   1 2   n xxi Next, some sample characteristics will be presented using frequency tables and charts.
  5. 5. 4 Table 3. Gender Frequency Frequency Percent Valid Percent Cumulative Percent Valid Boy 3809 50.2 50.2 50.2 Girl 3780 49.8 49.8 100.0 Total 7589 100.0 100.0 Figure 1. Sample Gender in Percentage From Table 3, in the sample of 7589 respondents, 50.2% are boy (3809) and 49.8% (3780) are girl. Figure 1 displayed the gender percentage.
  6. 6. 5 Table 4. School Year Frequency Percent Valid Percent Cumulative Percent Valid Year 7 1481 19.5 19.5 19.5 Year 8 1526 20.1 20.1 39.6 Year 9 1580 20.8 20.8 60.4 Year 10 1553 20.5 20.5 80.9 Year 11 1449 19.1 19.1 100.0 Total 7589 100.0 100.0 Figure 2. School Year in Percentage From Table 4, majority of respondents come from year 9 (20.8%), followed by year 10 (20.5%), year 8 (20.1%), year 7 (19.5%) and lastly year 11 (19.1%). Figure 2, clearly display the respondent’s school year in percentage. 18 18.5 19 19.5 20 20.5 21 Year 7 Year 8 Year 9 Year 10 Year 11 Percentage School Year SchoolYear in Percentage
  7. 7. 6 Table 5. Ever had a proper alcoholic drink Frequency Percent Valid Percent Cumulative Percent Valid Yes 3222 42.5 43.1 43.1 No 4256 56.1 56.9 100.0 Total 7478 98.5 100.0 Missing Not answered 111 1.5 Total 7589 100.0 Figure 3. Ever had a proper alcoholic drink From Table 5, it can be observed that 43% of respondents have had a proper alcoholic drink before. On the other hand, 57% did not had a proper alcoholic drink before. Next, one-tailed test about population proportion will be conducted.
  8. 8. 7 3. One-Tailed Test About a Population Proportion (Hypothesis Test) Rationale for conducting one-tailed test about a population proportion: National Statistic (2013) estimated that 45% of UK pupils (age 11 to 15) had drunk alcohol at least once. However, according to the data used in this report, it showed that in a valid sample of 7478 UK students, 3222 pupils had drunk alcohol at least once. It is therefore in the interest of the researcher to investigate whether the population porportion is really 45% or is it lower as presented in the data used. H0: π = 0.45 H1: π < 0.45 Level of significance: 0.05 Test statistic is calculated using the following formula: Z =    0 Where: Population standard deviation =   n 00 1     Assuming n ≥ 5 and n (1- ) ≥ 5 Checking assumption: 7478×0.45= 3365.1 ≥ 5, 7478×0.55= 4112.9 ≥ 5, therefore assumption holds and the researcher proceed to calculate test statistic. Test statistic calculation:   n 00 1     =   00575.0 7478 55.045.0  Z =    0 = 30.3 00575.0 45.0431.0   Using critical value approach: At 5% significance level, crtitical value = - 1.645 Z = -3.30 < - 1.645, therefore reject H0.
  9. 9. 8 Checking using p-value approach: From standard normal cumulative proability table, z = -3.30, p-value = 0.0005 p-value = 0.0005 < 0.05, therefore both approach are consistent, reject H0. There is sufficient evidence to reject H0 as p-value = 0.0005 < 0.05 and therefore accept H1. The reseracher conclude that the porportion of UK pupils who had drunk alcohol at least once are less than 45%, at 95% confidence level. Next, Chi-square test of association will be conducted.
  10. 10. 9 4. Chi-square test of association Rationale for conducting Chi-square test: Borsari and Carery (2001) claimed that excessive drinking is associated with peer pressure among university students. It is therefore in the interest of the researcher to test this claim among young pupils. In addition, peer pressure is a nominal variable and student drinking frequency is an interval variable. Therefore, Chi-square test is most appropriate. H0: Peer pressure and student drinking frequency are independent H1: Peer pressure and student drinking frequency are dependent Level of significan: 0.05 Test statistic is calculated using the following formula: Calculate expected table (see appendix 2) using n CR e ji ij  ˆ Pearson Chi-square statistic (see Table 6) using       r i c j ij ijij e eo x 1 1 2 2 ˆ ˆ Table 6. Chi-Square Tests Value df Asymp. Sig. (2- sided) Pearson Chi-Square 155.105 7 .000 Likelihood Ratio 152.157 7 .000 Linear-by-Linear Association 130.174 1 .000 N of Valid Cases 6981 Compare with x 2 (r-1)(c-1),alpha = x 2 (2-1)(8-1),0.05 = x 2 7, 0.05 =14.067 Since Chi-square value = 155.105 > 14.067, there is sufficient evidence to reject H0. There is association between peer pressure and student drinking frequency, at 95% confidence level. Chi-square only test whether the relationship exists, therefore, the researcher uses Contingency coefficient, Cramer’s V and Phi coefficient to measure strength of association. Table 7. Symmetric Measures Value Approx. Sig. Nominal by Nominal Phi .149 .000 Cramer's V .149 .000 Contingency Coefficient .147 .000 N of Valid Cases 6981
  11. 11. 10 Phi coefficient (Table 7) is calculated using the formula: 𝜑 = 1010 CCRR bcad  = 0.149, Phi can take the value of [-1,1]. Phi=0.149 indicates a weak positive association. The significance value of 0.000 means Phi value is significant. Cramer’s V (Table 7) is calculated using the formula: V =    1,1min 2  cr n x = 0.149, Cramer’s V takes value between 0 and 1. V=0.149 indicates a weak association. The significance value of 0.000 means Cramer’s V is significant. Contingency coefficient (Table 7) is calculated using the formula: C = nx x 2 2 = 0.147, Contingency coefficient takes value between 0 and 1. C=0.147 indicates a weak association. The significance value of 0.000 means Contingency coefficient is significant. Therefore, it can be concluded that there is association between peer pressure and student drinking frequency. However, the strength of association is not strong. Next, correlation analysis will be conducted.
  12. 12. 11 5. Correlation Rationale for conducting correlation analysis: National Statistic (2012) reported that family attitude and student drinking frequency are associated. The researcher is interested in investigating the strength of type of relationship using correlation. Both variables are in scale measurement, it is therefore suitable to conduct correlation analysis. Key theory of correlation: Correlation is a measure of linear association and does not necessarily indicate causation. The correlation coefficient can take on values between -1 and +1. Values near -1 indicate a strong negative linear relationship. Values near +1 indicate a strong positive linear relationship. Table 8. Correlation between family attitude and drinking frequency Family attitudes to pupil drinking Usual drinking frequency Family attitudes to pupil drinking Pearson Correlation 1 .553** Sig. (2-tailed) .000 N 7183 7147 Usual drinking frequency Pearson Correlation .553** 1 Sig. (2-tailed) .000 N 7147 7314 **. Correlation is significant at the 0.01 level (2-tailed). From Table 8, it can be observed that parent’s attitude to pupil drinking and usual drinking frequency have a moderate positive linear correlation of 0.553 at 0.01 significance level. It can be explained that student drinking frequency is related to parent’s attitude. Family members who disapproves student drinking tends to be related to lower drinking frequency. Conversely, parents who does not mind student drinking tends to be related to higher drinking frequency. The Pearson Correlation is calculated using the following formula: r = yx xy ss s =             11 1 22          n yy n xx n yyxx ii ii =           22 yyxx yyxx ii ii = 0.553 where: sxy = covariance (measure of the of the linear association between two variables) sx = standard deviation of x Sy = standard deviation of y
  13. 13. 12 6. Further Analysis The researcher would like to conduct multiple regression analysis on the dataset to model the form of the relationship on pupil drinking behaviour (dependent) and other independent variables. Besides that, factor analysis could be applied to identify and confirm the dimensionality of existing scales. Furthermore, the researcher would also like to conduct cluster analysis on the dataset to segment student drinkers base on different characteristics. These further analysis allows researchers to gain insight on student drinkers so that effective actions could be taken to curb alcohol consumption among young pupils. 7. Limitation The use of stratified sampling in this research could lead to sampling bias as stratas are difficult to identify. In addition, in this research, the stratas were divided according to school type (comprehensive, secondary modern, grammar and private). One key assumption of stratified sampling is that the stratas are homogenous. However, there is a possibility that the stratas are heterogenous. For example, in private schools there are single and mixed gender schools, also there are international schools. Another limitation of this research was questionnaire administartion. Students were given paper copy of the questionnaire and were asked to complete the questionnaire within 60 minutes, under exam condition with teacher supervision. The use of paper questionnaire leads to many missing values as students did not answer all questions. Besides that, the duration of questionnaire was too long, students might lose interest and not complete the questionnaire. Lastly, the presence of teacher supervision might pressure students to provide socially desirable answers. Future researcher could conduct computer adminstrated questionnaires with skip logic and compulsary questions. This would reduce the number of missing values. The duration of the questionnaire could be shortened to around 20 minutes to prevent students from losing interest. Lastly, there would be no teacher supervision to avoid any pschological pressure on students. 8. Conclusion In conclusion, this report has presented descriptive and summary statistic about the sample. It also conducted hypothesis test on population proportion of UK pupils who drank alcohol at least one. Chi-square test was also conducted to ascertain if peer pressure and drinking frequency were associated. Correlation test was conducted to measure the strength of type of relationship between family attitude and drinking frequency. Lastly, this report also mentioned on further analysis and limitation of dataset.
  14. 14. 13 References
  15. 15. 14 Appendix 1. Survey questions Are you a boy or a girl? Boy Girl Which year are you at school? Year 7 Year 8 Year 9 Year 10 Year 11 How old are you now? _______Years old Have you ever had a proper alcoholic drink? Yes No How often do you usually have an alcoholic dink in a month? 0-3 times 4-7 times 8-11 times 12-15 times 16-19 times 20-23 times 24-27 times (7) 28-31 times (8) How do your parents/guardian feel about you drinking alcohol? They won’t like me to drink alcohol at all (1) They don’t like but allow me to drink limited amount They won’t mind as long as I don’t drink too much They would let me drink as much as I like
  16. 16. 15 Write down the number of pints, half pints, large and small cans or bottles of alcohol that you have consumed in the past 7 day? _____Pints _____Half pints _____Large can _____Smallcan _____bottle I drink due to peer pressure Yes No
  17. 17. 16 Appendix 2. Cross Tab Case Processing Summary Cases Valid Missing Total N Percent N Percent N Percent People my age drink because ofpressure from friends * (D) Usual drinking frequency (8 cat) 6981 92.0% 608 8.0% 7589 100.0%
  18. 18. People my age drink because of pressure from friends * (D) Usual drinking frequency (8 cat) Crosstabulation (D) Usual drinking frequency (8 cat) Total Almost every day About twice a week About once a week About once a fortnight About once a month A few times a year Never drinks now Never had a drink People my age drink because of pressure from friends True Count 10 64 96 199 267 843 133 2599 4211 Expected Count 10.3 103.8 146.6 250.3 313.1 822.2 121.2 2443.6 4211.0 False Count 7 108 147 216 252 520 68 1452 2770 Expected Count 6.7 68.2 96.4 164.7 205.9 540.8 79.8 1607.4 2770.0 Total Count 17 172 243 415 519 1363 201 4051 6981 Expected Count 17.0 172.0 243.0 415.0 519.0 1363.0 201.0 4051.0 6981.0

×