This rule has been discussed earlier. Emphasize that there is still 0.3% of the distribution falling outside the 3 standard deviation limits.
Transcript of "Basics of statistics"
1.
Basic StatisticsFundamentals of Hypothesis Testing: One-Sample, Two-Sample Tests Chap 9-1
2.
What is biostatistics Statistics is the science and art of collecting, summarizing, and analyzing data that are subject to random variation. Bio statistics is the application of statistics and mathematical methods to the design and analysis of health, biomedical, and biological studies. Chap 9-2
3.
Different Tests of Significance1. One-Sample z-test or t-test a. Compares one sample mean versus a population mean2. Two-Sample t-test a. Compares one sample mean versus another sample mean a. Independent t-tests (equal samples) b. Dependent t-tests (dependent/paired samples)3. One-way analysis of variance (ANOVA) a. Comparing several sample means Chap 9-3
4.
How to properly use Biostatistics Develop an underlying question of interest Generate a hypothesis Design a study (Protocol) Collect Data Analyze Data Descriptive statistics Statistical Inference Chap 9-4
5.
Relationship between population and sample(Simple random sampling) Chap 9-5
6.
Sampling Techniques PopulationSimple Random Stratified Random Systematic Cluster Convenience Sample Sample Sampling Sampling Sampling Bias free Bias free Biased Bias free Biased sample sample sample sample sample Chap 9-6
7.
Example How are my 10 patients doing after I put them on an anti-hypertensive medications? Describe the results of your 10 patients Chap 9-7
8.
Example What is the in hospital mortality rate after open heart surgery at SAL hospital so far this year Describe the mortality What is the in hospital mortality after open heart surgery likely to be this year, given results from last year Estimate probability of death for patients like those seen in the previous year. Chap 9-8
9.
Misuse of statistics About 25% of biological research is flawed because of incorrect conclusions drawn from confounded experimental designs and misuse of statistical methods Chap 9-9
11.
Hypothesis Testing Process Assume the populationmean age is 50. ( H 0 : µ = 50) Identify the PopulationIs X = 20 likely if µ = 50 ? Take a Sample No, not likely! REJECT Null Hypothesis ( X = 20 ) Chap 9-11
12.
Reason for Rejecting H0 Sampling Distribution of XIt is unlikely that ... Therefore,we would get a we reject thesample mean of null hypothesisthis value ... that m = 50. ... if in fact this were the population mean. 20 µ = 50 X If H0 is true Chap 9-12
13.
Components of Biostatistics Biostatistics StatisticalDescriptive Inference Estimation Hypothesis Testing Confidence Intervals P-values Chap 9-13
14.
Normal DistributionA variable is said to be normally distributed or to have anormal distribution if its distribution has the shape of anormal curve. Chap 9-14
15.
Normal distribution bell-shaped symmetrical about the mean (No skewness) total area under curve = 1 approximately 68% of distribution is within one standard deviation of the mean approximately 95% of distribution is within two standard deviations of the mean approximately 99.7% of distribution is within 3 standard deviations of the mean Mean = Median = Mode Chap 9-15
16.
Empirical Rule About 68% of the area lies within 1 standard deviation 68% of the mean−3σ −2σ −σ µ +σ +2σ +3σ About 95% of the area lies within 2 standard deviations About 99.7% of the area lies within 3 standard deviations of the mean Chap 9-16
18.
Level of Significance, α Is designated by α , (level of significance) Typical values are .01, .05, .10 Is selected by the researcher at the beginning Provides the critical value(s) of the test Chap 9-18
19.
The z-Test for Comparing Population MeansCritical values for standard normal distribution Chap 9-19
20.
Level of Significance I claim that mean CVD in the INDIA is atleast 3!and the Rejection Region α H0: µ ≥ 3 Critical H1: µ < 3 Value(s) Rejection 0 Regions α H0: µ ≤ 3 H1: µ > 3 0 α /2 H0: µ = 3 H1: µ ≠ 3 0 Chap 9-20
21.
Hypothesis Testing1. State the research question.2. State the statistical hypothesis.3. Set decision rule.4. Calculate the test statistic.5. Decide if result is significant.6. Interpret result as it relates to your research question. Chap 9-21
22.
Rejection & Nonrejection Regions I claim that mean CVD in the INDIA is atleast 3! Two-tailed test Left-tailed test Right-tailed Sign in Ha = < >Rejection region Both sides Left side Right side Chap 9-22
23.
The Null Hypothesis, H0 States the assumption (numerical) to be tested e.g.: The average number of CVD in INDIA is at least three ( H 0 : µ ≥ 3 ) Is always about a population parameter ( H 0 : µ ≥ 3 about a sample ), not statistic ( ) H0 : X ≥ 3 Chap 9-23
24.
The Null Hypothesis, H0 (continued) Begins with the assumption that the null hypothesis is true Similar to the notion of innocent until proven guilty Chap 9-24
25.
The Alternative Hypothesis, H1 Is the opposite of the null hypothesis e.g.: The average number of CVD in INDIA is less than 3 ( H1 : µ < 3) Never contains the “=” sign May or may not be accepted Chap 9-25
26.
General Steps in Hypothesis Testinge.g.: Test the assumption that the true mean number of of σ CVD in INDIA is at least three ( Known) 1. State the H0 H0 : µ ≥ 3 2. State the H1 H1 : µ < 3 3. Choose α α =.05 4. Choose n n = 100 5. Choose Test Z test Chap 9-26
27.
General Steps in Hypothesis Testing (continued)6. Set up critical value(s) Reject H0 α Z -1.645 100 persons surveyed7. Collect data Computed test stat =-2,8. Compute test statistic p-value = .0228 and p-value9. Make statistical decision Reject null hypothesis The true mean number of CVD is10. Express conclusion less than 3 in human population. Chap 9-27
28.
The z-Test for Comparing Population MeansCritical values for standard normal distribution Chap 9-28
29.
p-Value Approach to Testing Convert Sample Statistic (e.g. X ) to Test Statistic (e.g. Z, t or F –statistic) Obtain the p-value from a table or computer Compare the p-value with ≥ α , do not reject H0 If p-value If p-value ≤ α , reject H0 Chap 9-29
30.
Comparison of Critical-Value & P-Value Approaches Critical-Value Approach P-Value ApproachStep1 State the null and alternative Step1 State the null andhypothesis. alternative hypothesis.Step 2 Decide on the significance Step 2 Decide on the significancelevel, α. level, α.Step 3 Compute the value of the Step 3 Compute the value of thetest statistic. test statistic.Step 4 Determine the critical Step 4 Determine the P-value.value(s).Step 5 If the value of the teststatistic falls in the rejection region, Step 5 If P < α, reject Ho;reject Ho; otherwise, do not reject otherwise do not reject Ho.Ho.Step 6 Interpret the result of the Step 6 Interpret the result of thehypothesis test. hypothesis test. Chap 9-30
31.
Result Probabilities H0: Innocent Jury Trial Hypothesis Test The Truth The TruthVerdict Innocent Guilty Decision H0 True H0 False Do Not Type IIInnocent Correct Error Reject 1-α Error (β ) H0 Type I PowerGuilty Error Correct Reject Error H0 (1 - β ) (α ) Chap 9-31
32.
Type I & II Errors Have an Inverse Relationship If you reduce the probability of one error, the other one increases so that everything else is unchanged. βα Chap 9-32
33.
Critical Values Approach to Testing Convert sample statistic (e.g.: X ) to test statistic (e.g.: Z, t or F –statistic) Obtain critical value(s) for a specified α from a table or computer If the test statistic falls in the critical region, reject H0 Otherwise do not reject H0 Chap 9-33
34.
One-tail Z Test for Mean ( σ Known) Assumptions Population is normally distributed If not normal, requires large samples Null hypothesis has ≤ or ≥ sign only Z test statistic X − µX X −µ Z= = σX σ/ n Chap 9-34
35.
Rejection Region H0: µ ≥ µ 0 H0: µ ≤ µ 0 H1: µ < µ 0 H1: µ > µ 0Reject H0 Reject H0 α α 0 Z 0 Z Z Must Be Significantly Small values of Z don’t Below 0 to reject H0 contradict H0 Don’t Reject H0 ! Chap 9-35
36.
Example: One Tail TestQ. Does an average box of cereal contain more than 368 grams of cereal? A random sample of 25 boxes showed X = 372.5. The company has 368 gm. specified σ to be 15 grams. Test at the H0: α = 0.05 level. µ ≤ 368 H1: µ > 368 Chap 9-36
37.
Finding Critical Value: One Tail Standardized CumulativeWhat is Z given α = 0.05? Normal Distribution Table (Portion)σZ =1 Z .04 .05 .06 .95 1.6 .9495 .9505 .9515 α = .05 1.7 .9591 .9599 .9608 0 1.645 Z 1.8 .9671 .9678 .9686 Critical Value 1.9 .9738 .9744 .9750 = 1.645 Chap 9-37
38.
Example Solution: One Tail TestH0: µ ≤ 368H1: µ > 368α = 0.5 X−µ Z= = 1.50n = 25 σCritical Value: 1.645 n Reject .05 Do Not Reject at α = .05 Conclusion: 0 1.645 Z No evidence that true 1.50 mean is more than 368 Chap 9-38
39.
p -Value Solution p-Value is P(Z ≥ 1.50) = 0.0668Use thealternative P-Value =.0668hypothesisto find the 1.0000direction of - .9332the rejection .0668region. 0 1.50 Z From Z Table: Z Value of Sample Lookup 1.50 to Statistic Obtain .9332 Chap 9-39
40.
p -Value Solution (continued) (p-Value = 0.0668) ≥ (α = 0.05) Do Not Reject. p Value = 0.0668 Reject α = 0.05 0 1.645 Z 1.50Test Statistic 1.50 is in the Do Not RejectRegion Chap 9-40
41.
Example: Two-Tail TestQ. Does an average box of cereal contain 368 grams of cereal? A random sample of 25 boxes showed X = 372.5. The company 368 gm. has specified σ to be 15 grams. Test at the H0: µ = 368 α = 0.05 level. H1: µ ≠ 368 Chap 9-41
42.
Example Solution: Two-Tail TestH0: µ = 368 Test Statistic:H1: µ ≠ 368 X − µ 372.5 − 368α = 0.05 Z= = = 1.50 σ 15n = 25 n 25Critical Value: ±1.96 Decision: Reject Do Not Reject at α = .05 .025 .025 Conclusion: No Evidence that True -1.96 0 1.96 Z Mean is Not 368 1.50 Chap 9-42
43.
p-Value Solution (p Value = 0.1336) ≥ (α = 0.05) Do Not Reject. p Value = 2 x 0.0668 Reject Reject α = 0.05 0 1.50 1.96 ZTest Statistic 1.50 is in the Do Not RejectRegion Chap 9-43
44.
Connection to Confidence Intervals For X = 372.5, σ = 15 and n = 25, the 95% confidence interval is: 372.5 − ( 1.96 ) 15 / 25 ≤ µ ≤ 372.5 + ( 1.96 ) 15 / 25 or 366.62 ≤ µ ≤ 378.38If this interval contains the hypothesized mean (368), we do not reject the null hypothesis. It does. Do not reject. Chap 9-44
45.
What is a t Test? Commonly Used Definition: Comparing two means to see if they are significantly different from each other Technical Definition: Any statistical test that uses the t family of distributions Chap 9-45
46.
Independent Samples t Test Use this test when you want to compare the means of two Independent Independent Mean Mean independent samples #1 #2 on a given variable • “Independent” means that the members of one sample do not Compare using t test include, and are not matched with, members of the other sample Example: • Compare the average height of 50 randomly selected men to that of Chap 9-46 50 randomly selected
47.
Dependent Samples t Test Used to compare the means of a single sample or of two matched or paired samples Example: • If a group of students took a math test in March and that same group of students took the same math test two months later in May, we could compare their average scores on the two test dates using a dependent samples t Chap 9-47 test
48.
Comparing the Two t TestsIndependent Samples Dependent Samples Tests the equality of the means Tests the equality of the means from two independent groups between related groups or of two (diagram below) variables within the same group Relies on the t distribution to (diagram below) produce the probabilities used to Relies on the t distribution to test statistical significance produce the probabilities used to test statistical significance Person Person Person Person #1 #2 #1 #1 Treatment group Control group Before treatment After treatment Chap 9-48
49.
Types One sample compare with population Unpaired compare with control Paired same subjects: pre-post Z-test large samples >30 Chap 9-49
50.
Compare Means (or medians)Example: Compare blood presures of two or more groups, or compare BP of one group with a theoretical value. 1 Group:1. One Sample t test2. Wilcoxon rank sum test 2 Groups:1. Unpaired t test2. Paired t test3. Mann-Whitney t test4. Welch’s corrected t test5. Wilcoxon matched pairs test Chap 9-50
51.
3-26 Groups:1. One-way ANOVA2. Repeated measures ANOVA3. Kruskal-Wallis test4. Friedman test (All with post tests) Raw data Average data Mean, SD, & NAverage data Mean, SEM, & N Chap 9-51
52.
Is there a difference?between you…means,who is meaner? Chap 9-52
53.
Statistical Analysis control treatment group group mean meanIs there a difference? Slide downloaded from the Internet Chap 9-53
54.
What does difference mean? The mean difference medium is the same for all variability three cases high variability low variability Slide downloaded from the Internet Chap 9-54
55.
What does difference mean? medium variability high variability Which one shows low the greatest variability difference? Slide downloaded from the Internet Chap 9-55
56.
t Test: σ Unknown Assumption Population is normally distributed If not normal, requires a large sample T test statistic with n-1 degrees of freedom X −µ t= S/ n Chap 9-56
57.
Example: One-Tail t TestDoes an average box ofcereal contain more than368 grams of cereal? Arandom sample of 36boxes showed X = 372.5, 368 gm.and s = 15. Test at theα = 0.01 level. H0: µ ≤ 368 H1: µ > σ is not given 368 Chap 9-57
58.
Example Solution: One-TailH0: µ ≤ 368 Test Statistic:H1: µ > 368 X − µ 372.5 − 368α = 0.01 t= = = 1.80 S 15n = 36, df = 35 n 36Critical Value: 2.4377 Reject Decision: Do Not Reject at α = .01 .01 Conclusion: No evidence that true 0 2.4377 t35 1.80 mean is more than 368 Chap 9-58
59.
The t Table Since it takes into account the changing shape of the distribution as n increases, there is a separate curve for each sample size (or degrees of freedom). However, there is not enough space in the table to put all of the different probabilities corresponding to each possible t score. The t table lists commonly used critical regions (at popular alpha levels). Chap 9-59
60.
Z-distribution versus t-distribution Chap 9-60
61.
The z-Test for Comparing Population MeansCritical values for standard normal distribution Chap 9-61
62.
Summary We can use the z distribution for testing hypotheses involving one or two independent samples To use z, the samples are independent and normally distributed The sample size must be greater than 30 Population parameters must be known Chap 9-62
Be the first to comment