I will introduce the specific logic and main steps of HT through an example.
t value given alpha=0.05 and DF=19. Explain the meaning of the 95%CI.
There are two explanations to the difference between 9.15 and 10.50. One is sampling error, another is different population mean.
We have to make a choice between the two hypotheses. We cannot prove which one is correct. The best way is to find which hypothesis is more contradicted with the data and reject it. We have to collect evidence=probability. H0 is relatively simple and easily find the statistical distribution . So focus on the H0, reject or not reject.
Under the H0, we want to find the P of getting the current sample data and more extreme. For the mean of a normal distribution with unknown variance.
t distribution under H0. According to the P value, the current situation and even more extreme situation are not quite possible to appear. That is, a small P value indicates that the information does not support the H0.
An ignorable small probability alpha should be defined in advance such as alpha=0.05, so the current P value could be regarded as small or almost zero.
How to understand the meaning of “it does not mean that the difference is big or obvious”?
When we test a hypothesis, we have to make a choice, reject or not reject H0.( Be or not to be). We make decision based on the probability, not prove. So we may make mistake. There are two kind of mistakes we might make.
If we consider that the pulses of healthy males in the mountainous area would never be lower than that in general area on average, then one-side test should be used.
t distribution under H0. According to the P value, the current situation and even more extreme situation are not quite possible to appear. That is, a small P value indicates that the information does not support the H0.
1 ounce = 28.35 g; 115 oz = 6.5 jin ; 120 oz = 6.8 jin
t distribution under H0. According to the P value, the current situation and even more extreme situation are not quite possible to appear. That is, a small P value indicates that the information does not support the H0.
IF a type I error is made, then a special-care nursery will be recommended, with all the extra costs involved, when in fact it is not needed. If a type II error is made, a special-care nursery will not be needed, when in fact it is needed. The nonmonetary cost of this decision is that low-birthweight babies may not survive without the unique equipment in a special-care nursery.
Transcript of "Chapter 4(1) Basic Logic"
1.
Hypothesis Testing for Continuous Variables MBBS.WEEBLY.COM Chapter4
3.
4.1 Specific logic and main steps of hypothesis testing
4.
4.1 Specific logic and main steps of hypothesis testing <ul><li>Example 4.1 : Randomly select 20 cases from the patients with certain kind of disease. The sample mean of blood sedimentation (mm/h) （血沉） is 9.15, sample standard deviation is 2.13. To estimate the 95% confidence interval and 99% confidence interval of population mean under the assumption that the blood sedimentation of this kind of disease follows a normal distribution </li></ul>
5.
the 95% confidence interval is ( 8.15, 10.15 ), the 99% confidence interval is ( 7.78, 10.51 ).
6.
Other consideration: <ul><li>However, researchers often have preconceived ideas about what these parameters might be and wish to test whether the data conform with these ideas. </li></ul>
7.
Question: <ul><li>Whether the population mean was equal to 10.50 that had been reported in the literatures? </li></ul><ul><li>It was one of the typical problems of hypothesis testing . </li></ul>
8.
Sample mean μ How to explain this difference? Two guesses
9.
4.1.1 Set up the statistical hypotheses null hypothesis alternative hypothesis
10.
4.1.2 Select statistics and calculate its current value
11.
Symmetric around 0 -2.8345 0 2.8345 Fig.4.1 Demonstration for the current value of t and the P -value
12.
4.1.3 Determine the P value <ul><li>P- value is defined as a probability of the event that the current situation and even more extreme situation towards appear in the population. </li></ul>
13.
<ul><li>The P-value can also be thought of as the probability of obtaining a test statistic as extreme as or more extreme than the actual test statistic obtained, given that the null hypothesis is true. </li></ul>
14.
Current situation Extreme situation -2.8345 0 2.8345 0.01< p <0.02 Fig.4.1 Demonstration for the current value of t and the P -value
15.
4.1.4 Decision and conclusion <ul><li>In general, the decision rule is: </li></ul><ul><li>When P ≤ , reject ; </li></ul><ul><li>otherwise, not reject . </li></ul>An ignorable small probability alpha should be defined in advance such as alpha=0.05
16.
Statements: <ul><li>For convenience of statement, “reject ” is often stated as “ there is a statistically significant difference ” or “ the difference is statistically significant ”, but it does not mean that the difference is big or obvious ; </li></ul>
17.
Statements: <ul><li>accordingly, “not reject ” is often stated as “ there is no statistically significant difference ” or “ the difference is not statistically significant ”. </li></ul><ul><li>there is no enough evidence to reject and it does not straightforwardly mean to “ accept ” </li></ul>
18.
Conclusion: <ul><li>The result of the above example might cover: t = -2.8345 ， P ＜ 0.02 ， reject , that is, there is a statistical significant difference between the population mean and 10.50 mm/h , which is reported in the literatures. </li></ul><ul><li>Incorporating the background, it is considerable that the blood sedimentation (mm/h) of this kind of patients might be lower than 10.50 on an average . </li></ul>
19.
Two Errors: <ul><li>Type I error : If is true, reject it. </li></ul><ul><li>Type II error : If is not true, not reject it. </li></ul>
20.
Probability of detecting a predefined statistical significant difference. Making Type I or Type II errors often result in monetary and nonmonetary costs.
21.
4.2 The t Test for One Group of Data under Completely Randomized Design
22.
4.2 The t Test for One Group of Data under Completely Randomized Design <ul><li>Based on the mean and standard deviation of a sample with n individuals randomly selected from a normal distribution , if one wants to judge whether the population mean is equal to a given constant , the t test for one group of data under completely randomized design can be used. </li></ul>
23.
main steps: <ul><li>(1) Set up the statistical hypotheses </li></ul><ul><li>(2) Select statistics and calculate its current value </li></ul><ul><li>(3) Determine the P- value </li></ul>
24.
<ul><li>(4) Decision and conclusion </li></ul><ul><li>Comparing the P- value with the pre-assigned small probability , if P ≤ , then reject ; otherwise, not reject . Finally, issue the conclusion incorporating with the background. </li></ul>
25.
Example 4.2 <ul><li>A large scale survey had reported that the mean of pulses for healthy males is 72 times/min . A physician randomly selected 25 healthy males in a mountainous area and measured their pulses, resulting in a sample mean of 75.2 times/min and a standard deviation of 6.5 times/min . Can one conclude that the mean of pulses for healthy males in the mountainous area is higher than that in the general population? </li></ul>
27.
One-side & two-side tests: two-side test one-side test
28.
<ul><li>Definition: </li></ul><ul><li>A two-side test is a test in which the values of the parameter being studied under the alternative hypothesis are allowed to be either greater than or less than the values of the parameter under the null hypothesis. </li></ul>
29.
<ul><li>Definition: </li></ul><ul><li>A one-side test is a test in which the values of the parameter being studied under the alternative hypothesis are allowed to be either greater than or less than the values of the parameter under the null hypothesis, but not both. </li></ul>
30.
Solution : t =2.69 ， 0.005< P <0.01 Conclusion: the mean of pulses for healthy males in the mountainous area is higher than that in the general population
31.
P value P value One side -2.69 0 2.69 0.005< p <0.01 Fig.4.1 Demonstration for the current value of t and the P -value
32.
Exercise 1: <ul><li>Suppose we want to test the hypothesis that mothers with low socioeconomic status (SES) deliver babies whose birth-weights are lower than “normal”. </li></ul><ul><li>To test this hypothesis, a list is obtained of birth-weights from 100 consecutive, full-term, live-born deliveries from the maternity ward of a hospital in a low-SES area. </li></ul>
33.
<ul><li>The mean birth-weight is found to be 115 oz , with a sample standard deviation of 24 oz . </li></ul><ul><li>Suppose we know from nationwide survey based on millions of deliveries that the mean birth-weight in the United States is 120 oz . </li></ul><ul><li>Can we actually say the underlying mean birth-weight from this hospital is lower than the national average? </li></ul>
34.
Questions: <ul><li>How to test the hypothesis? </li></ul><ul><li>What are the type I error and type II errors for the data? What results will be occurred by the errors? </li></ul>
37.
P value One side -2.08 0 2.08 0.01< p <0.05 Fig.4.1 Demonstration for the current value of t and the P -value
38.
Step3: <ul><li>We can reject H 0 at a significance level of 0.05. </li></ul><ul><li>The true mean birth-weight is significantly lower in this hospital than in the general population. </li></ul>
39.
Two Errors: <ul><li>Type I error would be the probability of deciding that the mean birth-weight in the hospital was lower than 120 oz when in fact it was 120 oz. </li></ul><ul><li>IF a type I error is made, then a special-care nursery will be recommended, with all the extra costs involved, when in fact it is not needed. </li></ul>
40.
<ul><li>Type II error would be the probability of deciding that the mean birth-weight was 120 oz when in fact it was lower than 120 oz. </li></ul><ul><li>If a type II error is made, a special-care nursery will not be needed, when in fact it is needed. The nonmonetary cost of this decision is that low-birthweight babies may not survive without the unique equipment in a special-care nursery. </li></ul>
Be the first to comment