Your SlideShare is downloading. ×
Lesson 15 Data Analysis I
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Lesson 15 Data Analysis I

1,177
views

Published on

If you want to learn Marketing Research Techniques using SPSS in Presentation Form...write to me at marketstrat1@gmail.com. …

If you want to learn Marketing Research Techniques using SPSS in Presentation Form...write to me at marketstrat1@gmail.com.

Ideal for Marketing Research Students and Practitioners embarking on a career in MR.

Marketing Mangement students in BBA and MBA courses can also benefit from this.

Costs of Presentations to suit every pocket!!!!

Published in: Business, Technology, Education

1 Comment
8 Likes
Statistics
Notes
No Downloads
Views
Total Views
1,177
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
1
Likes
8
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Data Analysis I: Tests of Significance,Sampling Statistics,Chi-Square Analysis and Analysis of Variance
    Lesson 15
  • 2. Continuous and Categorical Variables
    Continuous variables are ones which can be quantified or measured on a continuum, and which can take any value from zero to the largest number possible in the series.
    Example :
    A respondent’s age in years
    Number of miles he or she drives annually
    Percentages
    Household’s annual expenditure on all insurance
    Quantum of Coffee consumed annually by households
    Interval scales used in attitude measurement
  • 3. Two Types of Categorical Variables
    One type consists of variables which can only be measured in classes or categories.
    Such types of categorical variables are not quantifiable.
    Respondent’s sex – m/f
    Voter status –can vote, cannot vote
    Rrespondent’ occupation
  • 4. Two Types of Categorical Variables
    A second type of categorical variable includes ones which are more conveniently measured in categories than on a continuum.
    Different categories are associated with quantifiable numbers which show a progression from smaller values to larger values.
    Age
    Income
  • 5. Tests of Statistical Significance
    Differences in study findings have been observed between at least two distinct subpopulations, groups, classes or categories.
    Differences in consumption behavior of heavy vs light users of any product category
    By their nature, all categorical variables result in the creation of two or more classes or categories, and often the differences observed between any two of them will be subjected to a test of statistical significance.
  • 6. Tests of Statistical Significance
    Researchers will have asked the same question of each group and or measured the same variable within each group.
    The average quantum of coffee consumed annually per household in City A and City B.
    The two average coffee consumption figures is then tested for statistical significance.
  • 7. Which method of significance testing to use in a given situation?
    1. Depends on the number and format of the categories in which differences are observed is important.
    If the researchers are interested in the differences between two columns or two rows of numbers, Chi-Square analysis is used to test the significance of the observed differences.
    Eg., ascertain the pattern of heavy-moderate-light-nonusers which was observed in city A differs significantly from the corresponding pattern observed in city B in coffee consumption data.
  • 8. Which method of significance testing to use in a given situation?
    Some experiments result in a single table of findings and researchers can use Analysis of Variance to test the significance of the differences observed within that one table.
  • 9. Which method of significance testing to use in a given situation?
    2. Are the figures which are recorded in each category actual counts of the number of responses or are they percentages?
    Both Chi-square analysis and analysis of variance can only be applied to the actual number of responses falling into each category, and cannot be applied to percentages.
  • 10. Which method of significance testing to use in a given situation?
    3. Whether the data were obtained through experimentation or by descriptive field surveys which also influence the decision on which method to use.
    Experiments or Field Surveys (sampling statistics and chi-square)
  • 11. Four – Step procedure to determine which Test to Apply?
    1. Observe a difference which can have important implications for the Marketing Manager.
    In any set of findings researchers can observe many differences, but only a few of those differences might have important implications for the marketing manager.
    Those are the differences which should be tested for significance.
  • 12. Four – Step procedure to determine which Test to Apply?
    2. State a hypothesis which can be tested using one of the three methods – sampling statistic, chi-square analysis and analysis of variance.
    When applying significance tests researchers often hypothesize that the observed differences is not a statistically significant one – that is, they hypothesize that the observed difference could easily have occurred by chance because of sampling variation. Called a Null Hypothesis.
    Reject the Null Hypothesis if able to conclude that the observed difference is too large to have occurred by chance due to sampling variation.
    The observed difference must reflect a real difference which exists in the population being studied.
    When such evidence is obtained, conclusion is that the observed difference is statistically significant.
    If researchers unable to obtain such evidence, they will not be able to reject their null hypothesis and they will not be able to conclude that the observed difference is statistically significant.
    Also specify the confidence level, researchers would like when they decide whether or not they will reject their null hypothesis.
  • 13. Four – Step procedure to determine which Test to Apply?
    3. Calculate the appropriate “statistic” which quantifies the observed difference relative to the sample size which was used to gather the data.
    It is always necessary to calculate a statistic which quantifies the observed difference in relationship to the sample size used in the research.
    If the observed differences are large and/or if the samples used are large ones, the calculated statistics will have larger value.
    If the observed differences are small and/or if the samples used are small, the calculated statistic will have a smaller value.
  • 14. Four – Step procedure to determine which Test to Apply?
    4. Check to see if the calculated value of the statistic is large enough to allow researchers to conclude that the observed difference is statistically significant.
    If the calculated value of the statistic is larger than a certain “critical value” then reject the null hypothesis. A table of the critical values of the appropriate statistic.
  • 15. If calculated value is bigger than the Critical Value …
    If the calculated value is as big or bigger than the critical value then the observed difference is too big to be attributable to sampling variation. The researchers will reject their null hypothesis and conclude with a 95 percent level of confidence that the observed difference is statistically significant – ie., the observed difference reflects a real difference existing in the population studied.
  • 16. If calculated value is smaller than the Critical Value…
    If the calculated value is smaller than the critical value, researchers will not have the evidence they need to reject their null hypothesis, and so they would be unable to conclude that the observed difference is statistically significant.
  • 17. Application when Observed Differences are Two Percentages
    Sampling Statistics are commonly applied when :
    Observed Percentages in two different groups.
    Observed Percentage with an expected percentage.
  • 18. Comparing percentages from two samples
    Observe an important difference.
    Null Hypothesis.
    Calculate the appropriate statistic
    Z= (After%- Before%)
    S difference
    S difference = √s²a + s²b
    S difference = estimated standard error of the difference
    Sa = Estimated standard error of the after sample percentage
    Sb = Estimated standard error of the before sample percentage
    Sp=√(pq)/n
    P=percentage of the universe with the characteristic being studied
    Q=(100%-p)
    N=Size of Sample
  • 19. Comparing percentages from two samples
    Compare the calculated and critical values of Z
    Desired Confidence Level Critical Z Value
    90% 1.64
    95% 1.96
    99% 2.58
  • 20. Comparing observed percentage with expected percentage
    Observe an important difference
    Null Hypothesis
    Calculate the appropriate statistic
    Z=Expected%-Observed%
    Sp
    Sp = √(pq)/n
    Sp=standard error of a percentage
  • 21. Comparing observed percentage with expected percentage
    Compare the calculated and critical values of Z
    Desired Confidence Level Critical Z Values for One-Tail Tests of Significance:
    90% 1.28
    95% 1.64
    99% 2.33
  • 22. One Tail Vs Two Tail Tests
    One tail of the distribution of sample means.
    If a company has Mkt Share of 30% then acquire the company.
    Do a survey compute mkt share percentage, if the mktg share of sample survey is less than 30%, then they wont acquire company. This is One Tail Test.
    However, if it more than 30%, company would like to know how much more….this is Two Tail Test.
    When one is interested in testing the significance of findings on one side – the lower side of the expected percent usage level. The one-sided nature of this testing causes it to be referred to as a “one-tail test”
  • 23. Application when Observed Differences are Averages
    Sx=S/√n
    Where :
    Sx=Estimated standard error of the mean
    S=standard deviation of the sample
    N=Number of Observations in the sample
    Sdifference = √ (Sx for City A) ² + (Sx for City B) ²
    Z= (City A mean – City B mean)
    S difference
  • 24. Use the t Distribution when samples are small
    If the sample is small – 30 or fewer respondents – the results can easily affected by atypical items (ie., a few very large values and or a few very small values) Thus, the theory of normal distribution cannot be used since it is based on a sample sufficiently large to balance out the extreme cases.
    The means of small samples follow the t distribution.
    Use a larger number of standard errors in order to obtain a specified level of confidence.
  • 25. Chi-Square Analysis
    A Chi-square Analysis can be used :
    There must be two observed sets of data one observed set of data and one expected set of data. Typically these data sets are in table form (R rows and C columns) or in frequency distribution form (one row and C columns or R rows and one column)
    Two sets of data must be based on the same sample size.
    Each cell in the data contains an observed or expected count which is five or larger.
    The different cells in a row or column can represent either categorical variables (eg., male female) or continuous variable dat which have been placed into classes or categories (eg., age data placed into under 25, 25-40, over 40 categories).
    The most common uses involve :
    Two observed frequency distributions
    An observed frequency distribution with an expected one
    Two large tables of data.
  • 26. Chi Square Statistic (χ²)
    (χ²) = ∑ (fi – Fi)²
    Fi
    Where
    K = The number of cells
    i = The ith cell (where I = 1,2,3, …, k)
    fi=The observed count in the ith cell
    Fi=The “expected”count in the ith cell
    k
    i=1
  • 27. Number of Degrees
    To determine the number of degrees of freedom associated with the observed set of data : (column of data consisting of 4 cells)
    In general, the degrees of freedom (d.f.) associated with column or row data consisting of k cells is :
    d.f.=k-1
    Think of degrees of freedom as equivalent to the number of categories in the observed set of data.
    d.f.=(R-1) (C-1) for R Rows and C Columns.
  • 28. Examples where Chi-Square helps
    Data relating to different markets
    Different market segments
    Different type of packages
    Different advertising copies
    Useful for testing the significance of the difference observed between two sets of categorical data.
  • 29. Analysis of Variance
    Analysis of Variance can be applied to data using one test variable – like a test to measure the sales appeal effectiveness of three different package designs or a test to measure the elasticity of four different prices. Or there can be a combination of price and package variables.
    If only one test variable is used it is called one-way analysis of variable or one way anova.
    If two test variables are used, it is called two-way analysis of variance or two way anova.
  • 30. One-way Analysis of Variance
    The Bell Baking Company was interested in evaluating the sales effect of two different colours (the test variable) for the package of one of its cookie products.
    The firm selected 10 stores with similar monthly sales of cookies and randomly split them into two groups of five stores each.
    One group of stores was stocked only with red packages while the other group of stores was stocked only with blue packages.
    All stores were monitored for two weeks to make certain that the packages were properly displayed and that no stock-outs occurred.
  • 31. Sales Test of Package Colours
    TOTAL
    MEAN
  • 32. Analysis of Variance
  • 33. Observe Important Differences
    Sales averaged 15 packages per store in all 10 stores but the blue package averaged 20 packages per store while red averaged only 10 packages per store.
    Because this could be an important difference, researchers would like to test if the difference is statistically significant?
  • 34. Null Hypothesis
    Average Red Package sales (10) and average blue package sales (20) only represent sampling variations from the overall average sales (15) and do not represent real differences in the sales effectiveness of the two packages.
  • 35. Calculate the appropriate statistic
    F Statistic and it is calculated using certain measures of variation.
    Variation in a set of data is calculated by summing the square of the deviation of each item from the mean of all items.
    t
    Variation = ∑ (Xk – X )²
    k=1
    Where t=The number of items in the set of data
    k= The kth Item (k=1,2,3…t)
    Xk=The value of item k
    X = The mean of all t items
  • 36. Calculating
    Since the total mean of all data is 15,
    The total variation in the data is the sum of (Xk-15)² for all ten items.
    =(6-15)² + (8-15) ² + (10-15) ² + (12-15) ² +
    (14-15) ²+ (16-15) ² + (18-15) ² + (20-15) ² +
    (22-15) ² + (24-15) ²
    =330
  • 37. Between Column Variation
    =n1(x1-total mean) ² + n2² (x2 –total mean)
    Where
    n1=The number of observations in column 1
    n2=The number of observations in column 2
    X1=The mean of column 1
    X2=The mean of column 2
    =5(10-15) ² + 5(20-15)²
    =250
  • 38. Within Column Variation
    The variation of the numbers in each column relative to the column mean and summing it.
    (6-10) ² + (8-10) ² + (10-10) ² +
    (12-10) ² + ( 14-10) ² + (16-20) ² +
    (18-20) ² + (20-20) ² + (22-20) ² +
    (24-20)²
    =80
  • 39. Calculating Df
    Total df=(total number of observations ) – 1
    = 10-1
    =9
    Between Column df = C-1
    =2-1
    =1
    Unexplained df=Total df-Between column df
    =9-1
    =8
    Estimated Variance = Variation/Degrees of Freedom
    Estimated Variance is a measure of variation per degree of freedom
    Between column estimated variance is 250/1=250
    Unexplained estimated variance is 80/8=10
  • 40. Compare the calculated and critical values of F
    F= The estimated variance associated with the different colour packages
    Unexplained Estimated Variance
    F=Between column estimated variance
    Unexpalined Estimated Variance
    F=250/10
    =25
    Comparing Calculated and critical values of F for 1 df in numerator and 8 df in denominator critical F value is 5.32 at 95% confidence level
    Since F statistic 25 is much larger than the critical F value, researchers will have evidence which will lead them to reject the null hypothesis and to conclude with more than 95 percent confidence that the differences observed in the two column means compared to the total mean are statistically significant and are not due to sampling variations.
    Stated another way , the different package colours must have caused the different average sales figures.
    Hence, by observation the blue package was more effective than the red package.
  • 41. Total Variation
    Total Variation = Variation between columns + Variation within Columns (unexplained)
    Most of the variation is associated with the different colour packages and that therefore one of the colours must be more effective than the other.
  • 42. Two way Analysis of Variance
    In two way analysis of variance two variables are being tested.
    The Williams Candy Company was planning to test three new candy flavours (A,B,C).
    In the test the company wished also to measure the effect of three different retail price levels – 79 cents, 89 cents and 99 cents.
    The company selected 9 matched but geographically separated stores as the sites for the test. These stores had similar levels of candy sales and were located in neighbourhoods with similar demographic characteristics.
    The company arranged to have the new flavours delivered to the stores and to see the proper displaying and pricing of the candy in all stores throughout a four week period.
    At the end of the four weeks the unsold candy was collected from the stores, and the company determined the number of cases of each flavour which was sold at each of the three prices.
    Research Objective : Which of the flavours was most well received and what effect the different prices had, if any?
  • 43. Number of Cases of New Flavours Sold at Different PricesOutcome 1
    Total Mean = 10
  • 44. Outcome 1
    These data show no variation at all.
    Each cell contains the number 10.
    The Total Mean, The Mean of each of the three rows, and the mean of each of the three columns are all =10.
    No differences down the rows and no differences down the columns.
    Since there are no observed differences in the data, no effect can be attributed to either the different flavours or the different prices.
  • 45. Number of Cases of New Flavours Sold at Different PricesOutcome 2
    Total Mean = 10
  • 46. Outcome 2
    Researchers can observe differences in this set of data.
    However, the row means, the column means and the total mean are all identical and equal to 10.
    Carry out all the components of both variation and degrees of freedom in order to complete the analysis.
  • 47. Total Variation
    Total Variation = (10-10)² + (10-10) ² + (10-10) ²
    + (4-10) ² + (10-10) ² + (16-10) ²
    + (16-10) ² + (10-10) ² + (4-10) ²
    = 144
    Row Variations
    = C(∑ (row mean i – total mean) ²
    C= number of columns
    =3((10-10) ² + (10-10) ² + (10-10) ²)= 0
    Column Variations
    =R(∑(column mean I – total mean) ²
    R= number of rows
    =3((10-10) ²+ (10-10) ²+ (10-10) ²)=0
    Hence, there is no variation due to the flavours.
    TV = Variation of row means from total means +
    Variation of column means from total mean +
    Variation which is unexplained
    Component Variation
    Row 0
    Column 0
    Unexplained 144
    Total 144
  • 48. Number of Cases of New Flavours Sold at Different PricesPossible Outcome 3
    Total Mean 10
  • 49. Analysis of Variance Summary
  • 50. Observe important differences
    In this case, the researcher may well ask if the variation associated with flavours A, B, and C is significant – that is, if the observed differences are due to the flavours rather than to sampling variations.
  • 51. Null Hypothesis
    Average sales of 7,10 and 13 cases for flavours A, B and C only represent sampling variations from the overall mean sales of 10 cases.
    This hpothesis assumes that if a larger experiment were carried out, average sales for all stores would be 10 cases and average sales for each of the three flavours A, B and C would also be 10 cases.
    The researchers wish to be 95 percent confident of the conclusion they reach regarding this hypothesis.
  • 52. Calculate the appropriate statistic
    Total Variation
    = (8-10)² + (8-10) ² + (14-10) ² + (4-10) ² + (14-10)²+ (12-10) ²+ (9-10) ² + (8-10) ² + (13-10)²
    =94
    Row Variation = 3((10-10) ² + (10-10) ² + (10-10)²)
    =0
    Column Variation= 3((7-10) ² + (10-10) ² +(13-10) ²)
    =54
    Unexplained Variation = Total Variation-(row variation+ column variation)
    =94-(0+54)
    =40
  • 53. Degrees of Freedom
    Number of degrees of freedom associated with total variation are = (R X C) -1
    Total df =(3x3) -1 = 8
    Row df = R-1
    = 3-1
    = 2
    Column df=C-1
    =3-1
    =2
    Unexplained df = total df – (row df + column df)
    = 8-(2+2)
    = 4
    Estimated Variance = Variation/degrees of freedom
    = 54/2
    = 27
  • 54. F Statistic
    F = Estimated Variance with the different flavours/Unexplained estimated variance
    F = Column estimated variance/Unexplained estimated variance
    =27/10
    =2.70
  • 55. Compare the calculated and critical values of F
    Critical value of F for 2 df in the numerator and 4 df in the denominator which is associated with 95 percent confidence
    F=6.94
    Since calculated F=2.70
    If the calculated F value was as big as 6.94, researchers would have evidence leading them to reject their null hypothesis and to conclude with 95 percent confidence that the observed variation was statistically significant.
    Conclusion now is that the observed sales variation associated with the columns is not large enough to say that sales were affected by the different flavours used in the test.
    Similarly variation due to prices can be studied in case any of the row means were not equal to 10.
    Can be applied to any number of rows and any number of columns.