Introductory Data Analysis & Interpretation
Four scales of measurement <ul><li>Nominal scale </li></ul><ul><ul><ul><li>Categorical variables </li></ul></ul></ul><ul><...
Parametric Test <ul><li>Parametric Statistical test – test whose model specifies certain conditions about the parameters o...
Nonparametric test <ul><li>Nonparametric statistical test - do not specify  conditions about the parameters of the populat...
Parametric and Nonparametric Counterparts Function  Para Nonpara  Efficiency of Nonpar to Normal Test for one sample t or ...
Formulating the Hypothesis <ul><li>What is a Hypothesis? </li></ul><ul><ul><li>an educated guess </li></ul></ul><ul><li>Wh...
Formulating the Hypothesis <ul><li>The  null hypothesis  is a statement about the population value that will be tested.  <...
Formulating the Hypothesis <ul><li>The  alternative   hypothesis  is the hypothesis that includes all population values no...
Formulating the Hypothesis <ul><li>The  research hypothesis  (usually the  alternative   hypothesis ): </li></ul><ul><li>D...
Types of Statistical Errors <ul><li>Type II Error  - This type of statistical error occurs when the null hypothesis is fal...
Types of Statistical Errors
Establishing the  Decision Rule <ul><li>The  critical value  is </li></ul><ul><li>Determined by the significance level.  <...
Establishing the  Decision Rule <ul><li>The  significance level  is the maximum probability of committing a Type I statist...
Reject H 0 Do not reject H 0 Sampling Distribution Maximum probability of committing a Type I error =   Establishing the ...
Rejection region    = 0.05 0 From the standard normal table Then 0.5 0.4 Establishing the Critical Value as a  z  -Value
Establishing the  Decision Rule <ul><li>The  test statistic  is a function of the sampled observations that provides a bas...
Rejection region    = 0.05 0 0.5 0.4 Test Statistic in the  Rejection Region
Establishing the  Decision Rule <ul><li>The  p-value  is </li></ul><ul><li>The probability of obtaining a test statistic a...
Rejection region    = 0.05 0 0.5 0.4 Relationship Between the p-Value and the Rejection Region p-value  = 0.0036
Using the p-Value to Conduct the Hypothesis Test <ul><li>If the  p-value is less than or equal to a , reject the null hypo...
One-Tailed Hypothesis Tests <ul><li>A  one-tailed hypothesis test  is a test in which the entire rejection region is locat...
Two-Tailed Hypothesis Tests <ul><li>A  two-tailed hypothesis test  is a test in which the rejection region is split betwee...
0 Two-Tailed Hypothesis Tests
Choosing the Correct Statistical Test   Number of  dependent variables Number of  independent Variables Type of  Dependent...
Choosing the Correct Statistical Test   Number of  dependent variables Number  of  Independent**   Variables Type  of  Dep...
Choosing the Correct Statistical Test   Number of  dependent variables Number of  independent Variables Type  of  Dependen...
Choosing the Correct Statistical Test   Number  of  dependent  Variables Number of  independent variables Type of  Depende...
Choosing the Correct Statistical Test   Number of  dependent  Variables Number of independent  Variables Type  of  Depende...
Some statistical tests <ul><li>One sample t-test </li></ul><ul><ul><li>A one sample t-test allows us to test whether a sam...
Some statistical tests The mean of the variable write for this particular sample of students is 52.775, which is statistic...
Some statistical tests <ul><li>Chi-square goodness of fit </li></ul><ul><ul><li>A chi-square goodness of fit test allows u...
Chi-square goodness of fit These results show that racial composition in our sample does not differ significantly from the...
Some statistical tests <ul><li>Two independent samples t-test </li></ul><ul><ul><li>An independent samples t-test is used ...
Two independent samples t-test <ul><ul><li>The results indicate that there is a statistically significant difference betwe...
Some statistical tests <ul><li>Chi-square test </li></ul><ul><li>A chi-square test is used when you want to see if there i...
Chi-square test These results indicate that there is no statistically significant relationship between the type of school ...
Some statistical tests <ul><li>One-way ANOVA </li></ul><ul><li>A one-way analysis of variance (ANOVA) is used when you hav...
One-way ANOVA From this we can see that the students in the academic program have the highest mean writing score, while st...
Some statistical tests <ul><li>Correlation </li></ul><ul><li>A correlation is useful when you want to see the relationship...
Some statistical tests <ul><li>Simple linear regression </li></ul><ul><li>Simple linear regression allows us to look at th...
Simple linear regression We see that the relationship between write and read is positive (.552) and based on the t-value (...
Hypothesis Testing Joke <ul><li>This joke is from Mark Eakin  (eakin@omega.uta.edu ) </li></ul><ul><li>Most of you do not ...
THANK YOU ! Have a statistically significant day!
Upcoming SlideShare
Loading in...5
×

Stat topics

2,972

Published on

Published in: Technology, Business
15 Comments
10 Likes
Statistics
Notes
No Downloads
Views
Total Views
2,972
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
15
Likes
10
Embeds 0
No embeds

No notes for slide

Stat topics

  1. 1. Introductory Data Analysis & Interpretation
  2. 2. Four scales of measurement <ul><li>Nominal scale </li></ul><ul><ul><ul><li>Categorical variables </li></ul></ul></ul><ul><ul><ul><ul><li>gender (male, female), status (single, married) </li></ul></ul></ul></ul><ul><li>Ordinal Scale </li></ul><ul><ul><ul><li>Something to do with order </li></ul></ul></ul><ul><ul><ul><ul><li>- size (small, medium, large) </li></ul></ul></ul></ul><ul><li>Interval Scale </li></ul><ul><ul><ul><li>Equal intervals. No true zero point </li></ul></ul></ul><ul><ul><ul><ul><li>- temperature </li></ul></ul></ul></ul><ul><li>Ratio Scale </li></ul><ul><ul><ul><li>True zero point </li></ul></ul></ul><ul><ul><ul><ul><li>- scores, weight </li></ul></ul></ul></ul>
  3. 3. Parametric Test <ul><li>Parametric Statistical test – test whose model specifies certain conditions about the parameters of the population for which the research sample was drawn. </li></ul><ul><ul><ul><li>- the meaningfulness of the results depends on the validity of the assumptions </li></ul></ul></ul><ul><ul><ul><li>- measurement in the strength of at least interval scale </li></ul></ul></ul>
  4. 4. Nonparametric test <ul><li>Nonparametric statistical test - do not specify conditions about the parameters of the population from which the sample was drawn </li></ul><ul><ul><li>assumptions are fewer and much weaker </li></ul></ul><ul><ul><li>do not require measurements as strong as that of parametric test </li></ul></ul><ul><ul><li>another term for distribution-free tests </li></ul></ul>
  5. 5. Parametric and Nonparametric Counterparts Function Para Nonpara Efficiency of Nonpar to Normal Test for one sample t or z-test Sign test 0.63 Difff bet 2 dependent samples t - or z-test Wilcoxon signed-ranks 0.95 Difff bet 2 independent samples t or z-test Man-Whitney U/ Wilcoxon Rank-s Test 0.95 More than 2 independent samples 1-ANOVA 2-ANOVA Kruskal-Wallis Friedman’s 0.95 Relationship bet 2 variables Linear (Pearson) Rank Corr (Spearrman) 0.91
  6. 6. Formulating the Hypothesis <ul><li>What is a Hypothesis? </li></ul><ul><ul><li>an educated guess </li></ul></ul><ul><li>What is a Statistical Hypothesis? </li></ul><ul><ul><li>an assertion or conjecture concerning one or more populations. </li></ul></ul><ul><ul><li>The hypothesis are often statements about population parameters like expected value and variance </li></ul></ul>
  7. 7. Formulating the Hypothesis <ul><li>The null hypothesis is a statement about the population value that will be tested. </li></ul><ul><li>The null hypothesis will be rejected only if the sample data provide substantial contradictory evidence. </li></ul>
  8. 8. Formulating the Hypothesis <ul><li>The alternative hypothesis is the hypothesis that includes all population values not covered by the null hypothesis. </li></ul><ul><li>The alternative hypothesis is deemed to be true if the null hypothesis is rejected. </li></ul>
  9. 9. Formulating the Hypothesis <ul><li>The research hypothesis (usually the alternative hypothesis ): </li></ul><ul><li>Decision maker attempts to demonstrate it to be true. </li></ul><ul><li>Deemed to be the most important to the decision maker. </li></ul><ul><li>Not declared true unless the sample data strongly indicates that it is true. </li></ul>
  10. 10. Types of Statistical Errors <ul><li>Type II Error - This type of statistical error occurs when the null hypothesis is false and is not rejected. </li></ul><ul><li>Type I Error - This type of statistical error occurs when the null hypothesis is true and is rejected. </li></ul>
  11. 11. Types of Statistical Errors
  12. 12. Establishing the Decision Rule <ul><li>The critical value is </li></ul><ul><li>Determined by the significance level. </li></ul><ul><li>The cutoff value for a test statistic that leads to either rejecting or not rejecting the null hypothesis. </li></ul>
  13. 13. Establishing the Decision Rule <ul><li>The significance level is the maximum probability of committing a Type I statistical error. The probability is denoted by the symbol  . </li></ul>
  14. 14. Reject H 0 Do not reject H 0 Sampling Distribution Maximum probability of committing a Type I error =  Establishing the Decision Rule
  15. 15. Rejection region  = 0.05 0 From the standard normal table Then 0.5 0.4 Establishing the Critical Value as a z -Value
  16. 16. Establishing the Decision Rule <ul><li>The test statistic is a function of the sampled observations that provides a basis for testing a statistical hypothesis. </li></ul>
  17. 17. Rejection region  = 0.05 0 0.5 0.4 Test Statistic in the Rejection Region
  18. 18. Establishing the Decision Rule <ul><li>The p-value is </li></ul><ul><li>The probability of obtaining a test statistic at least as extreme as the test statistic we calculated from the sample. </li></ul><ul><li>Also known as the observed significance level . </li></ul>
  19. 19. Rejection region  = 0.05 0 0.5 0.4 Relationship Between the p-Value and the Rejection Region p-value = 0.0036
  20. 20. Using the p-Value to Conduct the Hypothesis Test <ul><li>If the p-value is less than or equal to a , reject the null hypothesis. </li></ul><ul><li>If the p-value is greater than a , do not reject the null hypothesis. </li></ul><ul><li>Example: </li></ul><ul><li>For  = 0.05 with the p-value = 0.02 for a particular test, then the null hypothesis is rejected. </li></ul>
  21. 21. One-Tailed Hypothesis Tests <ul><li>A one-tailed hypothesis test is a test in which the entire rejection region is located in one tail of the test statistic’s distribution. </li></ul>
  22. 22. Two-Tailed Hypothesis Tests <ul><li>A two-tailed hypothesis test is a test in which the rejection region is split between the two tails of the test statistic’s distribution. </li></ul>
  23. 23. 0 Two-Tailed Hypothesis Tests
  24. 24. Choosing the Correct Statistical Test Number of dependent variables Number of independent Variables Type of Dependent Variable(s) Type of Independent Variable(s)   Measure Test(s) 1   0 (1 population) continuous normal not applicable  (none)   mean one-sample t-test   continuous non-normal   median one-sample median   categorical   proportions   Chi Square goodness-of-fit, binomial test 1   1 (2 independent populations) normal   2 categories   mean 2 independent sample t-test   non-normal medians   Mann Whitney, Wilcoxon rank sum test   categorical   proportions   Chi square test Fisher’s Exact test
  25. 25. Choosing the Correct Statistical Test Number of dependent variables Number of Independent** Variables Type of Dependent Variable(s) Type of Independent Variable(s)   Measure Test(s) 1 0 (1 population measured twice) or 1 (2 matched populations) normal   not applicable/ categorical means paired t-test    non-normal   medians Wilcoxon signed ranks test    categorical   proportions McNemar, Chi-square test 1 1 (3 or more populations) normal categorical means one-way ANOVA non-normal medians Kruskal Wallis categorical proportions Chi square test
  26. 26. Choosing the Correct Statistical Test Number of dependent variables Number of independent Variables Type of Dependent Variable(s) Type of Independent Variable(s)   Measure Test(s) 1 2 or more (e.g., 2-way ANOVA) normal categorical means Factorial ANOVA non-normal medians Friedman test categorical proportions log-linear, logistic regression 1 0 (1 population measured  3 or more times) normal not applicable means Repeated measures ANOVA
  27. 27. Choosing the Correct Statistical Test Number of dependent Variables Number of independent variables Type of Dependent Variable(s) Type of Independent Variable(s)/  Measure Test(s) 1 1 normal continuous correlation simple linear regression non-normal   non-parametric correlation categorical categorical or continuous logistic regression continuous discriminant analysis 1   2 or more   normal continuous multiple linear regression    non-normal   categorical logistic regression normal mixed categorical and continuous Analysis of Covariance General Linear Models (regression)   non-normal   categorical logistic regression
  28. 28. Choosing the Correct Statistical Test Number of dependent Variables Number of independent Variables Type of Dependent Variable(s) Type of Independent Variable(s)/   Measure Test(s) 2 2 or more normal categorical MANOVA 2 or more 2 or more normal continuous multivariate multiple linear regression 2 sets of  2 or more 0 normal not applicable canonical correlation 2 or more 0 normal not applicable factor analysis
  29. 29. Some statistical tests <ul><li>One sample t-test </li></ul><ul><ul><li>A one sample t-test allows us to test whether a sample mean (of a normally distributed interval variable) significantly differs from a hypothesized value.  For example,, say we wish to test whether the average writing score (write) differs significantly from 50. </li></ul></ul>
  30. 30. Some statistical tests The mean of the variable write for this particular sample of students is 52.775, which is statistically significantly different from the test value of 50.  We would conclude that this group of students has a significantly higher mean on the writing test than 50.
  31. 31. Some statistical tests <ul><li>Chi-square goodness of fit </li></ul><ul><ul><li>A chi-square goodness of fit test allows us to test whether the observed proportions for a categorical variable differ from hypothesized proportions.  For example, let's suppose that we believe that the general population consists of 10% Hispanic, 10% Asian, 10% African American and 70% White folks.  We want to test whether the observed proportions from our sample differ significantly from these hypothesized proportions </li></ul></ul>
  32. 32. Chi-square goodness of fit These results show that racial composition in our sample does not differ significantly from the hypothesized values that we supplied (chi-square with three degrees of freedom = 5.029, p = .170).
  33. 33. Some statistical tests <ul><li>Two independent samples t-test </li></ul><ul><ul><li>An independent samples t-test is used when you want to compare the means of a normally distributed interval dependent variable for two independent groups.  For example, say we wish to test whether the mean for write is the same for males and females </li></ul></ul>
  34. 34. Two independent samples t-test <ul><ul><li>The results indicate that there is a statistically significant difference between the mean writing score for males and females (t = -3.734, p = .000).  In other words, females have a statistically significantly higher mean score on writing (54.99) than males (50.12). </li></ul></ul>
  35. 35. Some statistical tests <ul><li>Chi-square test </li></ul><ul><li>A chi-square test is used when you want to see if there is a relationship between two categorical variables. Let's see if there is a relationship between the type of school attended ( schtyp ) and students' gender ( female ).  Remember that the chi-square test assumes that the expected value for each cell is five or higher. This assumption is easily met in the examples below.  However, if this assumption is not met in your data, please see the section on Fisher's exact test below </li></ul>
  36. 36. Chi-square test These results indicate that there is no statistically significant relationship between the type of school attended and gender (chi-square with one degree of freedom = 0.047, p = 0.828).
  37. 37. Some statistical tests <ul><li>One-way ANOVA </li></ul><ul><li>A one-way analysis of variance (ANOVA) is used when you have a categorical independent variable (with two or more categories) and a normally distributed interval dependent variable and you wish to test for differences in the means of the dependent variable broken down by the levels of the independent variable.  For example, we wish to test whether the mean of write differs between the three program types (prog). </li></ul>
  38. 38. One-way ANOVA From this we can see that the students in the academic program have the highest mean writing score, while students in the vocational program have the lowest
  39. 39. Some statistical tests <ul><li>Correlation </li></ul><ul><li>A correlation is useful when you want to see the relationship between two (or more) normally distributed interval variables.  For example, we can run a correlation between two continuous variables, read and write. </li></ul>
  40. 40. Some statistical tests <ul><li>Simple linear regression </li></ul><ul><li>Simple linear regression allows us to look at the linear relationship between one normally distributed interval predictor and one one normally distributed interval outcome variable.  For example, we wish to look at the relationship between writing scores (write) and reading scores (read); in other words, predicting write from read. </li></ul>
  41. 41. Simple linear regression We see that the relationship between write and read is positive (.552) and based on the t-value (10.47) and p-value (0.000), we would conclude this relationship is statistically significant.  Hence, we would say there is a statistically significant positive linear relationship between reading and writing
  42. 42. Hypothesis Testing Joke <ul><li>This joke is from Mark Eakin (eakin@omega.uta.edu ) </li></ul><ul><li>Most of you do not know that when Santa was a young man he had to take a statistics course. When the class started covering two-sided hypothesis tests, he had a lot of trouble remembering where to put the equal sign. He started repeating to himself &quot;The equal sign goes in the null hypothesis. The equal sign goes in the null hypothesis. The equal sign goes in the null hypothesis.&quot; </li></ul><ul><ul><li>Eventually Santa had to shorten this phrase to make it easier to remember. In fact to this day you can still hear him say &quot;Ho, Ho, Ho.&quot; </li></ul></ul>
  43. 43. THANK YOU ! Have a statistically significant day!
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×