Like this presentation? Why not share!

# T10 statisitical analysis

## on Apr 11, 2012

• 1,267 views

### Views

Total Views
1,267
Views on SlideShare
1,267
Embed Views
0

Likes
1
24
0

No embeds

### Report content

• Comment goes here.
Are you sure you want to
• Mean difference between pairs of values
• Mean difference between pairs of values
• Mean difference between pairs of values

## T10 statisitical analysisPresentation Transcript

• Statistical AnalysisBy Rama Krishna Kompella
• Relationships Between Variables• The relationship between variables can be explained in various ways such as: – Presence /absence of a relationship – Directionality of the relationship – Strength of association – Type of relationship
• Relationships Between Variables• Presence / absence of a relationship – E.g., if we are interested to study the customer satisfaction levels of a fast-food restaurant, then we need to know if the quality of food and customer satisfaction have any relationship or not
• Relationships Between Variables• Direction of the relationship – The direction of a relationship can be either positive or negative – Food quality perceptions are related positively to customer commitment toward a restaurant.
• Relationships Between Variables• Strength of association– They are generally categorized as nonexistent, weak, moderate, or strong.– Quality of food is strongly associated with customer satisfaction in a fast-food restaurant
• Relationships Between Variables• Type of association – How can the link between Y and X best be described? – There are different ways in which two variables can share a relationship • Linear relationship • Curvilinear relationship
• Chi-Square (χ2) and Frequency Data• Today the data that we analyze consists of frequencies; that is, the number of individuals falling into categories. In other words, the variables are measured on a nominal scale.• The test statistic for frequency data is Pearson Chi-Square. The magnitude of Pearson Chi-Square reflects the amount of discrepancy between observed frequencies and expected frequencies.
• Steps in Test of Hypothesis1. Determine the appropriate test2. Establish the level of significance:α3. Formulate the statistical hypothesis4. Calculate the test statistic5. Determine the degree of freedom6. Compare computed test statistic against a tabled/critical value
• 1. Determine Appropriate Test• Chi Square is used when both variables are measured on a nominal scale.• It can be applied to interval or ratio data that have been categorized into a small number of groups.• It assumes that the observations are randomly sampled from the population.• All observations are independent (an individual can appear only once in a table and there are no overlapping categories).• It does not make any assumptions about the shape of the distribution nor about the homogeneity of variances.
• 2. Establish Level of Significance• α is a predetermined value• The convention • α = .05 • α = .01 • α = .001
• 3. Determine The Hypothesis:Whether There is an Association or Not• Ho : The two variables are independent• Ha : The two variables are associated
• 4. Calculating Test Statistics• Contrasts observed frequencies in each cell of a contingency table with expected frequencies.• The expected frequencies represent the number of cases that would be found in each cell if the null hypothesis were true ( i.e. the nominal variables are unrelated).• Expected frequency of two unrelated events is product of the row and column frequency divided by number of cases. Fe= Fr Fc / N
• 4. Calculating Test Statistics  ( Fo − Fe )  2χ = ∑ 2   Fe 
• 4. Calculating Test Statistics O fre bse qu rv en ed cie s  ( Fo − Fe )  2χ = ∑ 2   Fe  Ex que fre pe nc cte y d qu ted cy fre pec en Ex
• 5. Determine Degrees of of ber Num ls in leve n m df = (R-1)(C-1) colu le b Freedom varia Numb e levels r of in ro variab w le
• 6. Compare computed test statistic against a tabled/critical value• The computed value of the Pearson chi- square statistic is compared with the critical value to determine if the computed value is improbable• The critical tabled values are based on sampling distributions of the Pearson chi- square statistic• If calculated χ2 is greater than χ2 table value, reject Ho
• Example• Suppose a researcher is interested in buying preferences of environmentally conscious consumers.• A questionnaire was developed and sent to a random sample of 90 voters.• The researcher also collects information about the gender of the sample of 90 respondents.
• Bivariate Frequency Table or Contingency Table Favor Neutral Oppose f rowMale 10 10 30 50Female 15 15 10 40f column 25 25 40 n = 90
• Bivariate Frequency Table or Contingency Table Favor Neutral Oppose f rowMale 10 10 30 50Female 15 15 10 40f column e d 25 25 40 n = 90 erv cies bs en O qu fre
• Bivariate Frequency Table or Row frequency Contingency Table Favor Neutral Oppose f rowMale 10 10 30 50Female 15 15 10 40f column 25 25 40 n = 90
• Bivariate Frequency Table or Contingency Table Favor Neutral Oppose f row Male 10 10 30 50 Female 15 15 10 40 f column 25 25 40 n = 90Column frequency
• 1. Determine Appropriate Test1. Gender ( 2 levels) and Nominal2. Buying Preference ( 3 levels) and Nominal
• 2. Establish Level of Significance Alpha of .05
• 3. Determine The Hypothesis• Ho : There is no difference between men and women in their opinion on pro-environmental products.• Ha : There is an association between gender and opinion on pro-environmental products.
• 4. Calculating Test Statistics Favor Neutral Oppose f rowMen fo =10 fo =10 fo =30 50 fe =13.9 fe =13.9 fe=22.2Women fo =15 fo =15 fo =10 40 fe =11.1 fe =11.1 fe =17.8f column 25 25 40 n = 90
• 4. Calculating Test Statistics Favor Neutral Oppose f row = 50*25/90Men fo =10 fo =10 fo =30 50 fe =13.9 fe =13.9 fe=22.2Women fo =15 fo =15 fo =10 40 fe =11.1 fe =11.1 fe =17.8f column 25 25 40 n = 90
• 4. Calculating Test Statistics Favor Neutral Oppose f rowMen fo =10 fo =10 fo =30 50 fe =13.9 fe =13.9 fe=22.2 = 40* 25/90Women fo =15 fo =15 fo =10 40 fe =11.1 fe =11.1 fe =17.8f column 25 25 40 n = 90
• 4. Calculating Test Statistics (10 − 13.89) 2 (10 − 13.89) 2 (30 − 22.2) 2χ = 2 + + + 13.89 13.89 22.2 (15 − 11.11) 2 (15 − 11.11) 2 (10 − 17.8) 2 + + 11.11 11.11 17.8 = 11.03
• 5. Determine Degrees of Freedom df = (R-1)(C-1) = (2-1)(3-1) = 2
• 6. Compare computed test statistic against a tabled/critical value• α = 0.05• df = 2• Critical tabled value = 5.991• Test statistic, 11.03, exceeds critical value• Null hypothesis is rejected• Men and women differ significantly in their opinions on pro-environmental products
• SPSS Output Example Chi-Square Tests Asymp. Sig. Value df (2-sided)Pearson Chi-Square 11.025a 2 .004Likelihood Ratio 11.365 2 .003Linear-by-Linear 8.722 1 .003AssociationN of Valid Cases 90 a. 0 cells (.0%) have expected count less than 5. The minimum expected count is 11.11.
• Additional Information in SPSS Output• Exceptions that might distort χ2 Assumptions – Associations in some but not all categories – Low expected frequency per cell• Extent of association is not same as statistical significance Demonstrated through an example
• Another Example Heparin Lock Placement Complication Incidence * Heparin Lock Placement Time Group Crosstabulation Heparin Lock Time: Placement Time Group 1 = 72 hrs 1 2 Total Complication Had Compilca Count 9 11 20 2 = 96 hrs Incidence Expected Count 10.0 10.0 20.0 % within Heparin Lock 18.0% 22.0% 20.0% Placement Time Group Had NO Compilca Count 41 39 80 Expected Count 40.0 40.0 80.0 % within Heparin Lock 82.0% 78.0% 80.0% Placement Time Group Total Count 50 50 100 Expected Count 50.0 50.0 100.0 % within Heparin Lock 100.0% 100.0% 100.0% Placement Time Groupfrom Polit Text: Table 8-1
• Hypotheses in Smoking Habit• Ho: There is no association between complication incidence and duration of smoking habit. (The variables are independent).• Ha: There is an association between complication incidence and duration of smoking habit. (The variables are related).
• More of SPSS Output Chi-Square Tests Asymp. Sig. Exact Sig. Exact Sig. Value df (2-sided) (2-sided) (1-sided)Pearson Chi-Square .250b 1 .617Continuity Correctiona .063 1 .803Likelihood Ratio .250 1 .617Fishers Exact Test .803 .402Linear-by-Linear .248 1 .619AssociationN of Valid Cases 100 a. Computed only for a 2x2 table b. 0 cells (.0%) have expected count less than 5. The minimum expected count is 10. 00.
• Pearson Chi-Square• Pearson Chi-Square = . 250, p = .617 Since the p > .05, we fail to reject the null hypothesis Chi-Square Tests that the complication rate Value df Asymp. Sig. (2-sided) Exact Sig. (2-sided) Exact Sig. (1-sided) is unrelated to smoking Pearson Chi-Square Continuity Correctiona .250b .063 1 1 .617 .803 habit duration. Likelihood Ratio Fishers Exact Test Linear-by-Linear .250 1 .617 .803 .402• Continuity correction is .248 1 .619 Association N of Valid Cases 100 used in situations in which a. Computed only for a 2x2 table b. 0 cells (.0%) have expected count less than 5. The minimum expected count is 10. the expected frequency 00. for any cell in a 2 by 2 table is less than 10.
• More SPSS Output Symmetric Measures Asymp. a b Value Std. Error Approx. T Approx. Sig.Nominal by Phi -.050 .617Nominal Cramers V .050 .617Interval by Interval Pearsons R -.050 .100 -.496 .621cOrdinal by Ordinal Spearman Correlation -.050 .100 -.496 .621cN of Valid Cases 100 a. Not assuming the null hypothesis. b. Using the asymptotic standard error assuming the null hypothesis. c. Based on normal approximation.
• Phi Coefficient• Pearson Chi-Square Symmetric Measures Asymp. a Value Std. Error provides information Nominal by Nominal Phi Cramers V -.050 .050 about the existence of Interval by Interval Pearsons R -.050 .100 Ordinal by Ordinal Spearman Correlation -.050 .100 N of Valid Cases 100 relationship between 2 a. Not assuming the null hypothesis. b. Using the asymptotic standard error assuming the null hypothes nominal variables, but not c. Based on normal approximation. about the magnitude of the relationship• Phi coefficient is the χ 2 measure of the strength φ= of the association N
• Cramer’s V• When the table is larger than 2 Symmetric Measures by 2, a different index must be Asymp. a Value Std. Error Nominal by Phi -.050 used to measure the strength Nominal Interval by Interval Cramers V Pearsons R .050 -.050 .100 of the relationship between the Ordinal by Ordinal N of Valid Cases Spearman Correlation -.050 100 .100 variables. One such index is a. Not assuming the null hypothesis. b. Using the asymptotic standard error assuming the null hypothesis Cramer’s V. c. Based on normal approximation.• If Cramer’s V is large, it means that there is a tendency for particular categories of the first variable to be associated with χ 2 particular categories of the second variable. V= N (k − 1)
• Cramer’s V• When the table is larger than 2 Symmetric Measures by 2, a different index must be Asymp. a Value Std. Error Nominal by Phi -.050 used to measure the strength Nominal Interval by Interval Cramers V Pearsons R .050 -.050 .100 of the relationship between the Ordinal by Ordinal N of Valid Cases Spearman Correlation -.050 100 .100 variables. One such index is a. Not assuming the null hypothesis. b. Using the asymptotic standard error assuming the null hypothesis Cramer’s V. c. Based on normal approximation.• If Cramer’s V is large, it means that there is a tendency for particular categories of the first variable to be associated with χ 2 particular categories of the second variable. V= N (k − 1) Number of Smallest of cases number of rows or
• Q & As