Statistical AnalysisBy Rama Krishna Kompella
Relationships Between Variables• The relationship between variables can be  explained in various ways such as:  –   Presen...
Relationships Between Variables• Presence / absence of a relationship  – E.g., if we are interested to study the customer ...
Relationships Between Variables• Direction of the relationship  – The direction of a relationship can be either    positiv...
Relationships Between Variables• Strength of association– They are generally categorized as nonexistent, weak,  moderate, ...
Relationships Between Variables• Type of association  – How can the link between Y and X best be    described?  – There ar...
Chi-Square (χ2) and Frequency Data• Today the data that we analyze consists of frequencies; that  is, the number of indivi...
Steps in Test of Hypothesis1.   Determine the appropriate test2.   Establish the level of significance:α3.   Formulate the...
1. Determine Appropriate Test• Chi Square is used when both variables are  measured on a nominal scale.• It can be applied...
2. Establish Level of Significance• α is a predetermined value• The convention     • α = .05     • α = .01     • α = .001
3. Determine The Hypothesis:Whether There is an Association            or Not• Ho : The two variables are independent• Ha ...
4. Calculating Test Statistics• Contrasts observed frequencies in each cell of a  contingency table with expected frequenc...
4. Calculating Test Statistics      ( Fo − Fe )         2χ = ∑ 2                              Fe     
4. Calculating Test Statistics            O         fre bse            qu rv              en ed                cie        ...
5. Determine Degrees of                                                     of                                            ...
6. Compare computed test statistic      against a tabled/critical value• The computed value of the Pearson chi-  square st...
Example• Suppose a researcher is interested in buying  preferences of environmentally conscious  consumers.• A questionnai...
Bivariate Frequency Table or                Contingency Table               Favor   Neutral   Oppose   f rowMale          ...
Bivariate Frequency Table or                Contingency Table                    Favor   Neutral   Oppose   f rowMale     ...
Bivariate Frequency Table or                                               Row frequency                Contingency Table ...
Bivariate Frequency Table or                   Contingency Table                   Favor   Neutral   Oppose   f row   Male...
1. Determine Appropriate Test1. Gender ( 2 levels) and Nominal2. Buying Preference ( 3 levels) and Nominal
2. Establish Level of Significance            Alpha of .05
3. Determine The Hypothesis• Ho : There is no difference between men and  women in their opinion on pro-environmental  pro...
4. Calculating Test Statistics               Favor    Neutral    Oppose     f rowMen            fo =10   fo =10     fo =30...
4. Calculating Test Statistics           Favor    Neutral    Oppose     f row                        = 50*25/90Men        ...
4. Calculating Test Statistics           Favor    Neutral    Oppose     f rowMen        fo =10   fo =10     fo =30     50 ...
4. Calculating Test Statistics    (10 − 13.89) 2 (10 − 13.89) 2 (30 − 22.2) 2χ = 2                  +              +      ...
5. Determine Degrees of        Freedom      df = (R-1)(C-1) =       (2-1)(3-1) = 2
6. Compare computed test statistic       against a tabled/critical value•   α = 0.05•   df = 2•   Critical tabled value = ...
SPSS Output Example                     Chi-Square Tests                                                 Asymp. Sig.      ...
Additional Information in SPSS Output• Exceptions that might distort χ2 Assumptions  – Associations in some but not all ca...
Another Example Heparin Lock                       Placement                   Complication Incidence * Heparin Lock Place...
Hypotheses in Smoking Habit• Ho: There is no association between  complication incidence and duration of  smoking habit. (...
More of SPSS Output                                     Chi-Square Tests                                                  ...
Pearson Chi-Square• Pearson Chi-Square = .  250, p = .617 Since the p > .05, we fail to  reject the null hypothesis       ...
More SPSS Output                                    Symmetric Measures                                                    ...
Phi Coefficient• Pearson Chi-Square                                                 Symmetric Measures                    ...
Cramer’s V• When the table is larger than 2                                            Symmetric Measures  by 2, a differe...
Cramer’s V• When the table is larger than 2                                            Symmetric Measures  by 2, a differe...
Q & As
Upcoming SlideShare
Loading in …5
×

T10 statisitical analysis

1,056 views

Published on

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,056
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
10
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Mean difference between pairs of values
  • Mean difference between pairs of values
  • Mean difference between pairs of values
  • T10 statisitical analysis

    1. 1. Statistical AnalysisBy Rama Krishna Kompella
    2. 2. Relationships Between Variables• The relationship between variables can be explained in various ways such as: – Presence /absence of a relationship – Directionality of the relationship – Strength of association – Type of relationship
    3. 3. Relationships Between Variables• Presence / absence of a relationship – E.g., if we are interested to study the customer satisfaction levels of a fast-food restaurant, then we need to know if the quality of food and customer satisfaction have any relationship or not
    4. 4. Relationships Between Variables• Direction of the relationship – The direction of a relationship can be either positive or negative – Food quality perceptions are related positively to customer commitment toward a restaurant.
    5. 5. Relationships Between Variables• Strength of association– They are generally categorized as nonexistent, weak, moderate, or strong.– Quality of food is strongly associated with customer satisfaction in a fast-food restaurant
    6. 6. Relationships Between Variables• Type of association – How can the link between Y and X best be described? – There are different ways in which two variables can share a relationship • Linear relationship • Curvilinear relationship
    7. 7. Chi-Square (χ2) and Frequency Data• Today the data that we analyze consists of frequencies; that is, the number of individuals falling into categories. In other words, the variables are measured on a nominal scale.• The test statistic for frequency data is Pearson Chi-Square. The magnitude of Pearson Chi-Square reflects the amount of discrepancy between observed frequencies and expected frequencies.
    8. 8. Steps in Test of Hypothesis1. Determine the appropriate test2. Establish the level of significance:α3. Formulate the statistical hypothesis4. Calculate the test statistic5. Determine the degree of freedom6. Compare computed test statistic against a tabled/critical value
    9. 9. 1. Determine Appropriate Test• Chi Square is used when both variables are measured on a nominal scale.• It can be applied to interval or ratio data that have been categorized into a small number of groups.• It assumes that the observations are randomly sampled from the population.• All observations are independent (an individual can appear only once in a table and there are no overlapping categories).• It does not make any assumptions about the shape of the distribution nor about the homogeneity of variances.
    10. 10. 2. Establish Level of Significance• α is a predetermined value• The convention • α = .05 • α = .01 • α = .001
    11. 11. 3. Determine The Hypothesis:Whether There is an Association or Not• Ho : The two variables are independent• Ha : The two variables are associated
    12. 12. 4. Calculating Test Statistics• Contrasts observed frequencies in each cell of a contingency table with expected frequencies.• The expected frequencies represent the number of cases that would be found in each cell if the null hypothesis were true ( i.e. the nominal variables are unrelated).• Expected frequency of two unrelated events is product of the row and column frequency divided by number of cases. Fe= Fr Fc / N
    13. 13. 4. Calculating Test Statistics  ( Fo − Fe )  2χ = ∑ 2   Fe 
    14. 14. 4. Calculating Test Statistics O fre bse qu rv en ed cie s  ( Fo − Fe )  2χ = ∑ 2   Fe  Ex que fre pe nc cte y d qu ted cy fre pec en Ex
    15. 15. 5. Determine Degrees of of ber Num ls in leve n m df = (R-1)(C-1) colu le b Freedom varia Numb e levels r of in ro variab w le
    16. 16. 6. Compare computed test statistic against a tabled/critical value• The computed value of the Pearson chi- square statistic is compared with the critical value to determine if the computed value is improbable• The critical tabled values are based on sampling distributions of the Pearson chi- square statistic• If calculated χ2 is greater than χ2 table value, reject Ho
    17. 17. Example• Suppose a researcher is interested in buying preferences of environmentally conscious consumers.• A questionnaire was developed and sent to a random sample of 90 voters.• The researcher also collects information about the gender of the sample of 90 respondents.
    18. 18. Bivariate Frequency Table or Contingency Table Favor Neutral Oppose f rowMale 10 10 30 50Female 15 15 10 40f column 25 25 40 n = 90
    19. 19. Bivariate Frequency Table or Contingency Table Favor Neutral Oppose f rowMale 10 10 30 50Female 15 15 10 40f column e d 25 25 40 n = 90 erv cies bs en O qu fre
    20. 20. Bivariate Frequency Table or Row frequency Contingency Table Favor Neutral Oppose f rowMale 10 10 30 50Female 15 15 10 40f column 25 25 40 n = 90
    21. 21. Bivariate Frequency Table or Contingency Table Favor Neutral Oppose f row Male 10 10 30 50 Female 15 15 10 40 f column 25 25 40 n = 90Column frequency
    22. 22. 1. Determine Appropriate Test1. Gender ( 2 levels) and Nominal2. Buying Preference ( 3 levels) and Nominal
    23. 23. 2. Establish Level of Significance Alpha of .05
    24. 24. 3. Determine The Hypothesis• Ho : There is no difference between men and women in their opinion on pro-environmental products.• Ha : There is an association between gender and opinion on pro-environmental products.
    25. 25. 4. Calculating Test Statistics Favor Neutral Oppose f rowMen fo =10 fo =10 fo =30 50 fe =13.9 fe =13.9 fe=22.2Women fo =15 fo =15 fo =10 40 fe =11.1 fe =11.1 fe =17.8f column 25 25 40 n = 90
    26. 26. 4. Calculating Test Statistics Favor Neutral Oppose f row = 50*25/90Men fo =10 fo =10 fo =30 50 fe =13.9 fe =13.9 fe=22.2Women fo =15 fo =15 fo =10 40 fe =11.1 fe =11.1 fe =17.8f column 25 25 40 n = 90
    27. 27. 4. Calculating Test Statistics Favor Neutral Oppose f rowMen fo =10 fo =10 fo =30 50 fe =13.9 fe =13.9 fe=22.2 = 40* 25/90Women fo =15 fo =15 fo =10 40 fe =11.1 fe =11.1 fe =17.8f column 25 25 40 n = 90
    28. 28. 4. Calculating Test Statistics (10 − 13.89) 2 (10 − 13.89) 2 (30 − 22.2) 2χ = 2 + + + 13.89 13.89 22.2 (15 − 11.11) 2 (15 − 11.11) 2 (10 − 17.8) 2 + + 11.11 11.11 17.8 = 11.03
    29. 29. 5. Determine Degrees of Freedom df = (R-1)(C-1) = (2-1)(3-1) = 2
    30. 30. 6. Compare computed test statistic against a tabled/critical value• α = 0.05• df = 2• Critical tabled value = 5.991• Test statistic, 11.03, exceeds critical value• Null hypothesis is rejected• Men and women differ significantly in their opinions on pro-environmental products
    31. 31. SPSS Output Example Chi-Square Tests Asymp. Sig. Value df (2-sided)Pearson Chi-Square 11.025a 2 .004Likelihood Ratio 11.365 2 .003Linear-by-Linear 8.722 1 .003AssociationN of Valid Cases 90 a. 0 cells (.0%) have expected count less than 5. The minimum expected count is 11.11.
    32. 32. Additional Information in SPSS Output• Exceptions that might distort χ2 Assumptions – Associations in some but not all categories – Low expected frequency per cell• Extent of association is not same as statistical significance Demonstrated through an example
    33. 33. Another Example Heparin Lock Placement Complication Incidence * Heparin Lock Placement Time Group Crosstabulation Heparin Lock Time: Placement Time Group 1 = 72 hrs 1 2 Total Complication Had Compilca Count 9 11 20 2 = 96 hrs Incidence Expected Count 10.0 10.0 20.0 % within Heparin Lock 18.0% 22.0% 20.0% Placement Time Group Had NO Compilca Count 41 39 80 Expected Count 40.0 40.0 80.0 % within Heparin Lock 82.0% 78.0% 80.0% Placement Time Group Total Count 50 50 100 Expected Count 50.0 50.0 100.0 % within Heparin Lock 100.0% 100.0% 100.0% Placement Time Groupfrom Polit Text: Table 8-1
    34. 34. Hypotheses in Smoking Habit• Ho: There is no association between complication incidence and duration of smoking habit. (The variables are independent).• Ha: There is an association between complication incidence and duration of smoking habit. (The variables are related).
    35. 35. More of SPSS Output Chi-Square Tests Asymp. Sig. Exact Sig. Exact Sig. Value df (2-sided) (2-sided) (1-sided)Pearson Chi-Square .250b 1 .617Continuity Correctiona .063 1 .803Likelihood Ratio .250 1 .617Fishers Exact Test .803 .402Linear-by-Linear .248 1 .619AssociationN of Valid Cases 100 a. Computed only for a 2x2 table b. 0 cells (.0%) have expected count less than 5. The minimum expected count is 10. 00.
    36. 36. Pearson Chi-Square• Pearson Chi-Square = . 250, p = .617 Since the p > .05, we fail to reject the null hypothesis Chi-Square Tests that the complication rate Value df Asymp. Sig. (2-sided) Exact Sig. (2-sided) Exact Sig. (1-sided) is unrelated to smoking Pearson Chi-Square Continuity Correctiona .250b .063 1 1 .617 .803 habit duration. Likelihood Ratio Fishers Exact Test Linear-by-Linear .250 1 .617 .803 .402• Continuity correction is .248 1 .619 Association N of Valid Cases 100 used in situations in which a. Computed only for a 2x2 table b. 0 cells (.0%) have expected count less than 5. The minimum expected count is 10. the expected frequency 00. for any cell in a 2 by 2 table is less than 10.
    37. 37. More SPSS Output Symmetric Measures Asymp. a b Value Std. Error Approx. T Approx. Sig.Nominal by Phi -.050 .617Nominal Cramers V .050 .617Interval by Interval Pearsons R -.050 .100 -.496 .621cOrdinal by Ordinal Spearman Correlation -.050 .100 -.496 .621cN of Valid Cases 100 a. Not assuming the null hypothesis. b. Using the asymptotic standard error assuming the null hypothesis. c. Based on normal approximation.
    38. 38. Phi Coefficient• Pearson Chi-Square Symmetric Measures Asymp. a Value Std. Error provides information Nominal by Nominal Phi Cramers V -.050 .050 about the existence of Interval by Interval Pearsons R -.050 .100 Ordinal by Ordinal Spearman Correlation -.050 .100 N of Valid Cases 100 relationship between 2 a. Not assuming the null hypothesis. b. Using the asymptotic standard error assuming the null hypothes nominal variables, but not c. Based on normal approximation. about the magnitude of the relationship• Phi coefficient is the χ 2 measure of the strength φ= of the association N
    39. 39. Cramer’s V• When the table is larger than 2 Symmetric Measures by 2, a different index must be Asymp. a Value Std. Error Nominal by Phi -.050 used to measure the strength Nominal Interval by Interval Cramers V Pearsons R .050 -.050 .100 of the relationship between the Ordinal by Ordinal N of Valid Cases Spearman Correlation -.050 100 .100 variables. One such index is a. Not assuming the null hypothesis. b. Using the asymptotic standard error assuming the null hypothesis Cramer’s V. c. Based on normal approximation.• If Cramer’s V is large, it means that there is a tendency for particular categories of the first variable to be associated with χ 2 particular categories of the second variable. V= N (k − 1)
    40. 40. Cramer’s V• When the table is larger than 2 Symmetric Measures by 2, a different index must be Asymp. a Value Std. Error Nominal by Phi -.050 used to measure the strength Nominal Interval by Interval Cramers V Pearsons R .050 -.050 .100 of the relationship between the Ordinal by Ordinal N of Valid Cases Spearman Correlation -.050 100 .100 variables. One such index is a. Not assuming the null hypothesis. b. Using the asymptotic standard error assuming the null hypothesis Cramer’s V. c. Based on normal approximation.• If Cramer’s V is large, it means that there is a tendency for particular categories of the first variable to be associated with χ 2 particular categories of the second variable. V= N (k − 1) Number of Smallest of cases number of rows or
    41. 41. Q & As

    ×