Upcoming SlideShare
×

Lesson 15 Data Analysis I

1,462 views

Published on

If you want to learn Marketing Research Techniques using SPSS in Presentation Form...write to me at marketstrat1@gmail.com.

Ideal for Marketing Research Students and Practitioners embarking on a career in MR.

Marketing Mangement students in BBA and MBA courses can also benefit from this.

Costs of Presentations to suit every pocket!!!!

1 Comment
8 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Want to learn Marketing Research Techniques quickly?

Want to do your Class Assignments on MR in SPSS?

Want to get this Presentation?

Want to analyse your Data in SPSS?

Want an SPSS Tutorial?

Reach Me On marketstrat1@gmail.com

Are you sure you want to  Yes  No
Views
Total views
1,462
On SlideShare
0
From Embeds
0
Number of Embeds
19
Actions
Shares
0
0
1
Likes
8
Embeds 0
No embeds

No notes for slide

Lesson 15 Data Analysis I

1. 1. Data Analysis I: Tests of Significance,Sampling Statistics,Chi-Square Analysis and Analysis of Variance<br />Lesson 15<br />
2. 2. Continuous and Categorical Variables<br />Continuous variables are ones which can be quantified or measured on a continuum, and which can take any value from zero to the largest number possible in the series.<br />Example :<br />A respondent’s age in years<br />Number of miles he or she drives annually<br />Percentages <br />Household’s annual expenditure on all insurance<br />Quantum of Coffee consumed annually by households<br />Interval scales used in attitude measurement<br />
3. 3. Two Types of Categorical Variables<br />One type consists of variables which can only be measured in classes or categories.<br />Such types of categorical variables are not quantifiable.<br />Respondent’s sex – m/f<br />Voter status –can vote, cannot vote<br />Rrespondent’ occupation<br />
4. 4. Two Types of Categorical Variables<br />A second type of categorical variable includes ones which are more conveniently measured in categories than on a continuum.<br />Different categories are associated with quantifiable numbers which show a progression from smaller values to larger values.<br />Age<br />Income<br />
5. 5. Tests of Statistical Significance<br />Differences in study findings have been observed between at least two distinct subpopulations, groups, classes or categories.<br />Differences in consumption behavior of heavy vs light users of any product category <br />By their nature, all categorical variables result in the creation of two or more classes or categories, and often the differences observed between any two of them will be subjected to a test of statistical significance.<br />
6. 6. Tests of Statistical Significance<br />Researchers will have asked the same question of each group and or measured the same variable within each group.<br />The average quantum of coffee consumed annually per household in City A and City B.<br />The two average coffee consumption figures is then tested for statistical significance.<br />
7. 7. Which method of significance testing to use in a given situation?<br />1. Depends on the number and format of the categories in which differences are observed is important.<br />If the researchers are interested in the differences between two columns or two rows of numbers, Chi-Square analysis is used to test the significance of the observed differences.<br />Eg., ascertain the pattern of heavy-moderate-light-nonusers which was observed in city A differs significantly from the corresponding pattern observed in city B in coffee consumption data.<br />
8. 8. Which method of significance testing to use in a given situation?<br />Some experiments result in a single table of findings and researchers can use Analysis of Variance to test the significance of the differences observed within that one table.<br />
9. 9. Which method of significance testing to use in a given situation?<br />2. Are the figures which are recorded in each category actual counts of the number of responses or are they percentages?<br />Both Chi-square analysis and analysis of variance can only be applied to the actual number of responses falling into each category, and cannot be applied to percentages.<br />
10. 10. Which method of significance testing to use in a given situation?<br />3. Whether the data were obtained through experimentation or by descriptive field surveys which also influence the decision on which method to use.<br />Experiments or Field Surveys (sampling statistics and chi-square)<br />
11. 11. Four – Step procedure to determine which Test to Apply?<br />1. Observe a difference which can have important implications for the Marketing Manager.<br />In any set of findings researchers can observe many differences, but only a few of those differences might have important implications for the marketing manager.<br />Those are the differences which should be tested for significance.<br />
12. 12. Four – Step procedure to determine which Test to Apply?<br />2. State a hypothesis which can be tested using one of the three methods – sampling statistic, chi-square analysis and analysis of variance.<br />When applying significance tests researchers often hypothesize that the observed differences is not a statistically significant one – that is, they hypothesize that the observed difference could easily have occurred by chance because of sampling variation. Called a Null Hypothesis.<br /> Reject the Null Hypothesis if able to conclude that the observed difference is too large to have occurred by chance due to sampling variation. <br />The observed difference must reflect a real difference which exists in the population being studied.<br />When such evidence is obtained, conclusion is that the observed difference is statistically significant.<br />If researchers unable to obtain such evidence, they will not be able to reject their null hypothesis and they will not be able to conclude that the observed difference is statistically significant.<br />Also specify the confidence level, researchers would like when they decide whether or not they will reject their null hypothesis. <br />
13. 13. Four – Step procedure to determine which Test to Apply?<br />3. Calculate the appropriate “statistic” which quantifies the observed difference relative to the sample size which was used to gather the data.<br />It is always necessary to calculate a statistic which quantifies the observed difference in relationship to the sample size used in the research.<br />If the observed differences are large and/or if the samples used are large ones, the calculated statistics will have larger value. <br />If the observed differences are small and/or if the samples used are small, the calculated statistic will have a smaller value.<br />
14. 14. Four – Step procedure to determine which Test to Apply?<br />4. Check to see if the calculated value of the statistic is large enough to allow researchers to conclude that the observed difference is statistically significant.<br />If the calculated value of the statistic is larger than a certain “critical value” then reject the null hypothesis. A table of the critical values of the appropriate statistic.<br />
15. 15. If calculated value is bigger than the Critical Value …<br />If the calculated value is as big or bigger than the critical value then the observed difference is too big to be attributable to sampling variation. The researchers will reject their null hypothesis and conclude with a 95 percent level of confidence that the observed difference is statistically significant – ie., the observed difference reflects a real difference existing in the population studied.<br />
16. 16. If calculated value is smaller than the Critical Value…<br />If the calculated value is smaller than the critical value, researchers will not have the evidence they need to reject their null hypothesis, and so they would be unable to conclude that the observed difference is statistically significant.<br />
17. 17. Application when Observed Differences are Two Percentages<br />Sampling Statistics are commonly applied when :<br />Observed Percentages in two different groups.<br />Observed Percentage with an expected percentage.<br />
18. 18. Comparing percentages from two samples<br />Observe an important difference.<br />Null Hypothesis.<br />Calculate the appropriate statistic <br />Z= (After%- Before%)<br /> S difference<br />S difference = √s²a + s²b<br />S difference = estimated standard error of the difference<br />Sa = Estimated standard error of the after sample percentage<br />Sb = Estimated standard error of the before sample percentage<br />Sp=√(pq)/n<br />P=percentage of the universe with the characteristic being studied<br />Q=(100%-p)<br />N=Size of Sample<br />
19. 19. Comparing percentages from two samples<br />Compare the calculated and critical values of Z<br />Desired Confidence Level Critical Z Value<br />90% 1.64<br />95% 1.96<br />99% 2.58<br />
20. 20. Comparing observed percentage with expected percentage<br />Observe an important difference<br />Null Hypothesis<br />Calculate the appropriate statistic<br />Z=Expected%-Observed%<br /> Sp<br />Sp = √(pq)/n<br />Sp=standard error of a percentage<br />
21. 21. Comparing observed percentage with expected percentage<br />Compare the calculated and critical values of Z<br />Desired Confidence Level Critical Z Values for One-Tail Tests of Significance:<br />90% 1.28<br />95% 1.64<br />99% 2.33<br />
22. 22. One Tail Vs Two Tail Tests<br />One tail of the distribution of sample means.<br />If a company has Mkt Share of 30% then acquire the company.<br />Do a survey compute mkt share percentage, if the mktg share of sample survey is less than 30%, then they wont acquire company. This is One Tail Test.<br />However, if it more than 30%, company would like to know how much more….this is Two Tail Test.<br />When one is interested in testing the significance of findings on one side – the lower side of the expected percent usage level. The one-sided nature of this testing causes it to be referred to as a “one-tail test”<br />
23. 23. Application when Observed Differences are Averages<br />Sx=S/√n<br />Where :<br />Sx=Estimated standard error of the mean<br />S=standard deviation of the sample<br />N=Number of Observations in the sample<br />Sdifference = √ (Sx for City A) ² + (Sx for City B) ²<br />Z= (City A mean – City B mean)<br /> S difference<br />
24. 24. Use the t Distribution when samples are small<br />If the sample is small – 30 or fewer respondents – the results can easily affected by atypical items (ie., a few very large values and or a few very small values) Thus, the theory of normal distribution cannot be used since it is based on a sample sufficiently large to balance out the extreme cases.<br />The means of small samples follow the t distribution.<br />Use a larger number of standard errors in order to obtain a specified level of confidence.<br />
25. 25. Chi-Square Analysis<br />A Chi-square Analysis can be used :<br />There must be two observed sets of data one observed set of data and one expected set of data. Typically these data sets are in table form (R rows and C columns) or in frequency distribution form (one row and C columns or R rows and one column)<br />Two sets of data must be based on the same sample size.<br />Each cell in the data contains an observed or expected count which is five or larger.<br />The different cells in a row or column can represent either categorical variables (eg., male female) or continuous variable dat which have been placed into classes or categories (eg., age data placed into under 25, 25-40, over 40 categories). <br />The most common uses involve :<br />Two observed frequency distributions<br />An observed frequency distribution with an expected one<br />Two large tables of data.<br />
26. 26. Chi Square Statistic (χ²)<br />(χ²) = ∑ (fi – Fi)²<br />Fi<br />Where <br />K = The number of cells<br />i = The ith cell (where I = 1,2,3, …, k)<br />fi=The observed count in the ith cell<br />Fi=The “expected”count in the ith cell<br />k<br />i=1<br />
27. 27. Number of Degrees<br />To determine the number of degrees of freedom associated with the observed set of data : (column of data consisting of 4 cells)<br />In general, the degrees of freedom (d.f.) associated with column or row data consisting of k cells is :<br />d.f.=k-1<br />Think of degrees of freedom as equivalent to the number of categories in the observed set of data.<br />d.f.=(R-1) (C-1) for R Rows and C Columns.<br />
28. 28. Examples where Chi-Square helps<br />Data relating to different markets<br />Different market segments<br />Different type of packages<br />Different advertising copies<br />Useful for testing the significance of the difference observed between two sets of categorical data.<br />
29. 29. Analysis of Variance<br />Analysis of Variance can be applied to data using one test variable – like a test to measure the sales appeal effectiveness of three different package designs or a test to measure the elasticity of four different prices. Or there can be a combination of price and package variables.<br />If only one test variable is used it is called one-way analysis of variable or one way anova.<br />If two test variables are used, it is called two-way analysis of variance or two way anova.<br />
30. 30. One-way Analysis of Variance<br />The Bell Baking Company was interested in evaluating the sales effect of two different colours (the test variable) for the package of one of its cookie products. <br />The firm selected 10 stores with similar monthly sales of cookies and randomly split them into two groups of five stores each.<br />One group of stores was stocked only with red packages while the other group of stores was stocked only with blue packages.<br />All stores were monitored for two weeks to make certain that the packages were properly displayed and that no stock-outs occurred.<br />
31. 31. Sales Test of Package Colours<br />TOTAL<br />MEAN<br />
32. 32. Analysis of Variance<br />
33. 33. Observe Important Differences<br />Sales averaged 15 packages per store in all 10 stores but the blue package averaged 20 packages per store while red averaged only 10 packages per store.<br />Because this could be an important difference, researchers would like to test if the difference is statistically significant?<br />
34. 34. Null Hypothesis<br />Average Red Package sales (10) and average blue package sales (20) only represent sampling variations from the overall average sales (15) and do not represent real differences in the sales effectiveness of the two packages.<br />
35. 35. Calculate the appropriate statistic<br />F Statistic and it is calculated using certain measures of variation.<br />Variation in a set of data is calculated by summing the square of the deviation of each item from the mean of all items.<br />t<br /> Variation = ∑ (Xk – X )²<br />k=1<br />Where t=The number of items in the set of data<br /> k= The kth Item (k=1,2,3…t)<br />Xk=The value of item k<br /> X = The mean of all t items<br />
36. 36. Calculating<br />Since the total mean of all data is 15, <br />The total variation in the data is the sum of (Xk-15)² for all ten items.<br />=(6-15)² + (8-15) ² + (10-15) ² + (12-15) ² + <br /> (14-15) ²+ (16-15) ² + (18-15) ² + (20-15) ² + <br /> (22-15) ² + (24-15) ²<br />=330<br />
37. 37. Between Column Variation<br />=n1(x1-total mean) ² + n2² (x2 –total mean)<br />Where <br />n1=The number of observations in column 1<br />n2=The number of observations in column 2<br />X1=The mean of column 1<br />X2=The mean of column 2<br />=5(10-15) ² + 5(20-15)²<br />=250<br />
38. 38. Within Column Variation<br />The variation of the numbers in each column relative to the column mean and summing it.<br />(6-10) ² + (8-10) ² + (10-10) ² +<br />(12-10) ² + ( 14-10) ² + (16-20) ² +<br />(18-20) ² + (20-20) ² + (22-20) ² +<br />(24-20)²<br />=80<br />
39. 39. Calculating Df<br />Total df=(total number of observations ) – 1<br />= 10-1<br />=9<br />Between Column df = C-1<br />=2-1<br />=1<br />Unexplained df=Total df-Between column df<br />=9-1<br />=8<br />Estimated Variance = Variation/Degrees of Freedom<br />Estimated Variance is a measure of variation per degree of freedom<br />Between column estimated variance is 250/1=250<br />Unexplained estimated variance is 80/8=10<br />
40. 40. Compare the calculated and critical values of F<br />F= The estimated variance associated with the different colour packages<br />Unexplained Estimated Variance<br />F=Between column estimated variance<br />Unexpalined Estimated Variance<br />F=250/10<br /> =25<br />Comparing Calculated and critical values of F for 1 df in numerator and 8 df in denominator critical F value is 5.32 at 95% confidence level <br />Since F statistic 25 is much larger than the critical F value, researchers will have evidence which will lead them to reject the null hypothesis and to conclude with more than 95 percent confidence that the differences observed in the two column means compared to the total mean are statistically significant and are not due to sampling variations.<br />Stated another way , the different package colours must have caused the different average sales figures.<br />Hence, by observation the blue package was more effective than the red package.<br />
41. 41. Total Variation<br />Total Variation = Variation between columns + Variation within Columns (unexplained)<br />Most of the variation is associated with the different colour packages and that therefore one of the colours must be more effective than the other.<br />
42. 42. Two way Analysis of Variance<br />In two way analysis of variance two variables are being tested.<br />The Williams Candy Company was planning to test three new candy flavours (A,B,C). <br />In the test the company wished also to measure the effect of three different retail price levels – 79 cents, 89 cents and 99 cents. <br />The company selected 9 matched but geographically separated stores as the sites for the test. These stores had similar levels of candy sales and were located in neighbourhoods with similar demographic characteristics.<br />The company arranged to have the new flavours delivered to the stores and to see the proper displaying and pricing of the candy in all stores throughout a four week period.<br />At the end of the four weeks the unsold candy was collected from the stores, and the company determined the number of cases of each flavour which was sold at each of the three prices.<br />Research Objective : Which of the flavours was most well received and what effect the different prices had, if any?<br />
43. 43. Number of Cases of New Flavours Sold at Different PricesOutcome 1<br />Total Mean = 10<br />
44. 44. Outcome 1<br />These data show no variation at all.<br />Each cell contains the number 10.<br />The Total Mean, The Mean of each of the three rows, and the mean of each of the three columns are all =10.<br />No differences down the rows and no differences down the columns.<br />Since there are no observed differences in the data, no effect can be attributed to either the different flavours or the different prices. <br />
45. 45. Number of Cases of New Flavours Sold at Different PricesOutcome 2<br />Total Mean = 10<br />
46. 46. Outcome 2<br />Researchers can observe differences in this set of data.<br />However, the row means, the column means and the total mean are all identical and equal to 10.<br />Carry out all the components of both variation and degrees of freedom in order to complete the analysis.<br />
47. 47. Total Variation<br />Total Variation = (10-10)² + (10-10) ² + (10-10) ²<br />+ (4-10) ² + (10-10) ² + (16-10) ²<br />+ (16-10) ² + (10-10) ² + (4-10) ²<br />= 144<br />Row Variations <br />= C(∑ (row mean i – total mean) ²<br />C= number of columns<br />=3((10-10) ² + (10-10) ² + (10-10) ²)= 0<br />Column Variations<br />=R(∑(column mean I – total mean) ²<br />R= number of rows<br />=3((10-10) ²+ (10-10) ²+ (10-10) ²)=0<br />Hence, there is no variation due to the flavours.<br />TV = Variation of row means from total means + <br /> Variation of column means from total mean +<br /> Variation which is unexplained<br />Component Variation<br />Row 0<br />Column 0<br />Unexplained 144 <br />Total 144<br />
48. 48. Number of Cases of New Flavours Sold at Different PricesPossible Outcome 3<br />Total Mean 10<br />
49. 49. Analysis of Variance Summary<br />
50. 50. Observe important differences<br />In this case, the researcher may well ask if the variation associated with flavours A, B, and C is significant – that is, if the observed differences are due to the flavours rather than to sampling variations.<br />
51. 51. Null Hypothesis<br />Average sales of 7,10 and 13 cases for flavours A, B and C only represent sampling variations from the overall mean sales of 10 cases.<br />This hpothesis assumes that if a larger experiment were carried out, average sales for all stores would be 10 cases and average sales for each of the three flavours A, B and C would also be 10 cases.<br />The researchers wish to be 95 percent confident of the conclusion they reach regarding this hypothesis. <br />
52. 52. Calculate the appropriate statistic<br />Total Variation <br />= (8-10)² + (8-10) ² + (14-10) ² + (4-10) ² + (14-10)²+ (12-10) ²+ (9-10) ² + (8-10) ² + (13-10)²<br />=94<br />Row Variation = 3((10-10) ² + (10-10) ² + (10-10)²)<br />=0<br />Column Variation= 3((7-10) ² + (10-10) ² +(13-10) ²)<br />=54<br />Unexplained Variation = Total Variation-(row variation+ column variation)<br />=94-(0+54)<br />=40<br />
53. 53. Degrees of Freedom<br />Number of degrees of freedom associated with total variation are = (R X C) -1<br />Total df =(3x3) -1 = 8<br />Row df = R-1 <br /> = 3-1 <br /> = 2<br />Column df=C-1<br /> =3-1<br /> =2<br />Unexplained df = total df – (row df + column df)<br /> = 8-(2+2)<br /> = 4<br />Estimated Variance = Variation/degrees of freedom<br /> = 54/2<br /> = 27<br />
54. 54. F Statistic<br />F = Estimated Variance with the different flavours/Unexplained estimated variance<br />F = Column estimated variance/Unexplained estimated variance<br />=27/10<br />=2.70<br />
55. 55. Compare the calculated and critical values of F<br />Critical value of F for 2 df in the numerator and 4 df in the denominator which is associated with 95 percent confidence<br />F=6.94<br />Since calculated F=2.70<br />If the calculated F value was as big as 6.94, researchers would have evidence leading them to reject their null hypothesis and to conclude with 95 percent confidence that the observed variation was statistically significant.<br />Conclusion now is that the observed sales variation associated with the columns is not large enough to say that sales were affected by the different flavours used in the test.<br />Similarly variation due to prices can be studied in case any of the row means were not equal to 10.<br />Can be applied to any number of rows and any number of columns.<br />