Upcoming SlideShare
×

Ebd1 lecture 6&7 2010

531 views

Published on

Published in: Education, Technology
0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

Views
Total views
531
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
10
0
Likes
0
Embeds 0
No embeds

No notes for slide

Ebd1 lecture 6&7 2010

1. 1. General Studies Community Dentistry 1 Statistical Inference Lecture 6 Dr Nizam Abdullah Contents Review of descriptive statistics The normal curve Introduction to inferential statistics © The University of Adelaide, School of Dentistry1
2. 2. Descriptive statistics Central tendency Mean Median (50th Percentile) Mode Dispersion Standard deviation (SD) / Variance Inter-quartile range (IQR) (3rd quartile – 1st quartile) Range (Maximum – Minimum) © The University of Adelaide, School of Dentistry Distribution of a variable Another important aspect of the description of a variable is the shape of its distribution, which tells you the frequency of values from different ranges of the variable. Typically, a researcher is interested in how well the distribution can be approximated by the normal distribution. The normal distribution can be used to determine how far the sample is likely to be off from the overall population, i.e. how big a ‘margin of error’ there is likely to be. Simple descriptive statistics can provide some information relevant to this issue. © The University of Adelaide, School of Dentistry2
3. 3. Distribution of a variable (cont.) A variable is said to be a normally distributed variable or to have a normal distribution if its distribution has the shape of a normal curve - the normal curve is a kind of bell-shaped curve. A normal distribution (and hence a normal curve) is completely determined by its mean and standard deviation - the mean and standard deviation are called the parameters of the normal curve. The normal curve is symmetric and centered about the mean. The standard deviation determines the spread of the curve. The larger the standard deviation, the flatter and more spread out the curve will be. © The University of Adelaide, School of Dentistry Normal curve (cont.) The mean, median, and mode all have the same value. © The University of Adelaide, School of Dentistry3
4. 4. Different shapes of the Normal curve Standard deviation changes the relative width of the distribution; the larger the standard deviation, the wider the curve. © The University of Adelaide, School of Dentistry Properties of normal distribution Age distribution of Village A • Bell-shaped curve 45 • Symmetrical about its mean (mirror image 40 35 to each side) 30 25 68% Mean and median are equal. 20 15 50% 95%50% 99.7% One side of the mean is 50% of the area. 10 5 0 The area between mean-1SD and mean+1SD is 68% (Mean±1SD=30, 50). 80+ 0 -4 5 -9 1 0 -1 4 1 5 -1 9 2 0 -2 4 2 5 -2 9 3 0 -3 4 3 5 -3 9 4 0 -4 4 4 5 -4 9 5 0 -5 4 5 5 -5 9 6 0 -6 4 6 5 -6 9 7 0 -7 4 7 5 -7 9 The area between mean-2SD and e.g. : (Age) Mean = 40, SD = 10 mean+2SD is 95% (Mean±2SD=20, 60). Therefore, Mean±1SD = 30, 50 Between 30 yr and 50 yr old, The area between mean-3SD and there will be 68% of the group. mean+3SD is 99.7% (Mean±3SD=10, 70). © The University of Adelaide, School of Dentistry4
5. 5. Normal curve: 68-95-99.7 rule 68% of the observations fall within 68% one standard deviation of the mean -σ µ +σ 95% of the observations fall within two standard deviations 95% of the mean -2σ µ +2σ 99.7% of the observations fall within three standard 99.7% deviations of the mean -3σ µ +3σ © The University of Adelaide, School of Dentistry Distributions: Negative In a negatively skewed distribution, the mode is at the top of the curve, the median is lower than it, and the mean is lower than the median. The result is a ‘tail’ towards the more negative side of the graph. Negative skewness (tail to left = left skewed) Median < Mode Mean < Median © The University of Adelaide, School of Dentistry5
6. 6. Distributions: Positive In a positively skewed distribution, the mode is at the top of the curve, the median is higher than it, and the mean is higher than the median. The result is a ‘tail’ towards the more positive side of the graph. Positive skewness : (tail to right = right skewed) Median > Mode Mean > Median © The University of Adelaide, School of Dentistry Example dataset First Year BDS students enrolled in EBD1. Response to survey: n= 90 (out of 119), or 76%. Variables: – Age: quantitative variable measured on a ratio scale – Sex: qualitative variable measured on a nominal scale, i.e. variable with categories male or female – Height: quantitative variable measured on a ratio scale – Weight: quantitative variable measured on a ratio scale Variables measured at a higher level can always be converted to a lower level, but not vice versa. For example, observations of actual age (ratio scale) can be converted to categories of older and younger (ordinal scale). Similarly for height and weight.The University of Adelaide, School of Dentistry ©6
7. 7. Data spreadsheet Case Age Sex Height Weight 1 21 2 165 45 2 22 2 170 53 3 18 1. 74 4 20 2 165 44 5 19 1 175 70 6 19 2 163 53 7 24 2 163 49 8 18 1 170 60 9 29 2 178 70 10 28 2 163 58 11 18 2 177 72 12 38 2 164 65 13 23 2 161 65 14 20 2 178 63 15 29 2 159 54 : : : : : : University of Adelaide, School: of Dentistry : © The : : Frequency distribution of height variable Height (cm) Frequency Mode = 165cm, 150-155 4 170cm 155-160 9 Mean = 169.3cm 160-165 21 Median = 168cm 165-170 24 170-175 10 175-180 10 180-185 7 185-190 3 190-195 1 Total 89 150 155 160 165 170 175 180 185 190 195 © The University of Adelaide, School of Dentistry7
8. 8. Frequency distribution of weight variable Weight (kg) Freq. 40-45 4 45-50 9 Mean = 62.6 kg 50-55 15 Median = 60 kg 55-60 16 60-65 13 65-70 7 70-75 11 75-80 4 80-85 3 85-90 3 90-95 2 95-100 1 100-105 1 105-110 0 110-115 0 115-120 0 120-125 1 40 45 50 55 60 65 70 75 80 85 90 95 100 105 120 125 Total 90 © The University of Adelaide, School of Dentistry Frequency distribution of age variable Mode < Median < Mean 18 yrs 19 yrs 20 yrs © The University of Adelaide, School of Dentistry8
9. 9. Descriptive statistics Variable Freq Min Max Range SD Age 90 17.0 38.0 21.0 3.4 Height 89 152.0 191.0 39.0 8.8 Weight 90 40.0 120.0 80.0 14.5 Variable Category Freq % Sex Male 32 35.6 Female 58 64.4 Total 90 100.0 © The University of Adelaide, School of Dentistry What is Inferential Statistics ? It is the Statistical Technique/Method used to infer the result of the sample (statistic) to the population (parameter). Population (Village A) µ=? The technique is called “Inferential Statistics” Sample x = 10.14 © The University of Adelaide, School of Dentistry9
10. 10. Statistical inference Inferential statistics are used to draw inferences about a population from a sample. For example, the average number of decayed teeth in children aged 5 years can be estimated using observations from a sample of 5-year-olds. © The University of Adelaide, School of Dentistry Selecting a sample from a population How can a sample that is representative of the population of interest be selected? Answer: by random selection When a random sample is drawn from the population of interest, every member of the population has the same probability, or chance, of being selected in the sample. For this reason, random samples are considered to be unbiased. © The University of Adelaide, School of Dentistry10
11. 11. Two types of Inferential Statistics Parameter Estimation Hypothesis testing © The University of Adelaide, School of Dentistry 1. Parameter estimation • Parameter estimation takes two forms: • 1. Point estimation • 2. Interval estimation © The University of Adelaide, School of Dentistry11
12. 12. Definition • A point estimate is a single numerical value used to estimate the corresponding population parameter • An interval estimate consists of two numerical values defining a range of values that, with a specific degree of confidence, we feel includes the parameter being estimated © The University of Adelaide, School of Dentistry Parameter estimation Point estimate is when an estimate of the population parameter is given as a single number, e.g. sample mean, median, variance, standard deviation. Interval estimation involves more than one point; it consists of a range of values within which the population parameter is thought to be, a confidence interval which contains the upper and lower limits of the range of values. Point and interval estimates let us infer the true value of an unknown population parameter using information from a random sample of that population. © The University of Adelaide, School of Dentistry12
13. 13. Confidence intervals (cont.) Example Suppose a paper reports that, among a sample of 2,823 5–6-year-old children living in Sharjah, the mean number of decayed teeth is 0.81 (SD = 1.66) with a 95% confidence interval of (0.75, 0.87). Interpretation The 95% confidence interval is the range in the mean number of decayed teeth we would expect in a population of 6-year-old children living in Sharjah. Because only a sample of children were used, the exact population mean cannot be known for certain. Hence, the 95% confidence interval indicates the margin of imprecision due to sampling error. Or, alternatively, you could think of it as the range in which there is a 95% chance that the true population mean lies. © The University of Adelaide, School of Dentistry 1. Estimation (CI) Population µ=? CI = x ± {ta/2 * (Standard Error)} Sample x = 10.14 95% CI = x ± { t0.025 * ( S.E )} 95% CI = 10.14 ± {1.96 * (0.43)} © The University of Adelaide, School of Dentistry13
14. 14. 1. Estimation (CI) Population µ=? CI = x ± {tα/2 * (Standard Error)} 95% CI = x ± { t0.025 * ( S.E )} Sample 95% CI = 10.14 ± {1.96 * (0.43)} x = 10.14 95% CI = 10.14 ± 0.8514 s.d = 4.3 n = 100 s.d 95% CI = 9.29, 10.99 S.E = n 4.3 S.E = = 0.43 100 © The University of Adelaide, School of Dentistry 1. Estimation (CI) Population µ=? 95% CI = 9.29, 10.99 Sample x = 10.14 We are 95% sure that mean of the population will lie between 9.29 and 10.99. 99% CI = 9.02, 11.26 For 99% replace 1.96 with 2.58 © The University of Adelaide, School of Dentistry14
15. 15. 95% Confidence interval formula ⎛ Std . Dev ⎞ Estimate ± 1.96 ∗ ⎜ ⎟ Std. error e.g. Mean ⎝ n ⎠ Standard deviation vs. Standard error of the statistic These two statistics are used for very different purposes. Standard deviation is a measure of spread of a set of observations. Standard error measures sampling error and is used to indicate the precision of a statistic, i.e. how close the statistic is to of Adelaide, School of Dentistry estimating. © The University the parameter it is Standard error example ⎛ Std . Dev ⎞ 0.81 Estimate ± 1.96 ∗ ⎜ ⎟ ⎝ n ⎠ 1.66 Std Error = = 0.03 Standard error of the mean 2823 In a sample of 2,823 5–6-year-old children living in Sharjah, the mean number of decayed teeth is 0.81 and std deviation is 1.66. The standard error is approximately 0.03. So, we expect, on average, observed sample means of 0.81, but, when we’re wrong, we expect to be off by about 0.03 points, on average. Standard error of the sample mean gives an indication of the extent to which the sample mean deviates from the population mean. © The University of Adelaide, School of Dentistry15
16. 16. 2. Hypothesis Testing © The University of Adelaide, School of Dentistry What is hypothesis testing? In Estimation, we estimate a population parameter from a sample statistic In Hypothesis testing, we answer to a specific question related to a population parameter © The University of Adelaide, School of Dentistry16
17. 17. Hypothesis testing • A (statistical) hypothesis is a statement of belief about population parameters • It is a predominant feature of quantitative research in oral health & health care research in general • Researchers can test a hypothesis to see whether the collected data support or refute such hypothesis © The University of Adelaide, School of Dentistry 2 types of hypotheses • The null hypothesis, symbolized by Ho; proposes no relationship between 2 variables or no effect in the population • The alternative hypothesis, symbolized by Ha; is a statement that disagrees with the null hypothesis. © The University of Adelaide, School of Dentistry17
18. 18. • If the null hypothesis is rejected as a result of sample evidence, then the alternative hypothesis is concluded • If the evidence is insufficient to reject, the null hypothesis is retained, but not accepted • Traditionally researches do not accept the null hypothesis from current evidence; they state that it cannot be rejected © The University of Adelaide, School of Dentistry Example A toothpaste company claims that their toothpaste contains, on average, 1100 ppm of fluoride. Suppose we are interested in testing this claim. We will randomly sample 100 tubes (i.e., n=100) of toothpaste from this company and under identical conditions calculate the average fluoride content (in ppm) for this sample. From the sample of 100 tubes of toothpaste, the average ppm was found to be 1035 (= X ). Could this sample have been drawn from a population with mean fluoride content of µ=1,100 (known variance σ2=200). © The University of Adelaide, School of Dentistry18
19. 19. Basic steps in hypothesis testing 1. Propose a research question (identify the parameter of interest). 2. State the null hypothesis, H0 and alternative hypotheses, HA 3. Define a threshold value for declaring a P-value significant. The threshold is called the significance level of the test is denoted by alpha (α) and is commonly set to 0.05. 4. Select the appropriate statistical test to compute the P-value. © The University of Adelaide, School of Dentistry Basic steps in hypothesis testing (cont.) 5. Compare the P-value of your test to the chosen level of significance. Can the null hypothesis be rejected? 6. If P-value < α , conclude that the difference is statistically significant and decide to reject the null hypothesis. If P-value ≥ α, conclude that the difference is not statistically significant and decide not to reject the null hypothesis. © The University of Adelaide, School of Dentistry19
20. 20. Example A toothpaste company (X) claims that their toothpaste contains, on average, 1100 ppm of fluoride. What is the research question? X © The University of Adelaide, School of Dentistry What is hypothesis testing? Research Q: Is the mean fluoride content in toothpaste X 1100ppm? Ans: Yes or No 1) Null hypothesis: The mean fluoride content in toothpaste X is equal to 1100ppm Ho: µ = 1100 2) Alternative hypothesis : The mean fluoride content in toothpaste X is not equal to 1100ppm Ha: µ ≠ 1100 © The University of Adelaide, School of Dentistry20
21. 21. Define the p value (commonly set at 0.05 Select appropriate test to compute the p value At the end of the hypothesis testing, we will get a P value. If the P value is less than 0.05, we reject the Null Hypothesis (Ho). If the P value is more than or equal to 0.05, we cannot reject the Null Hypothesis (Ho). © The University of Adelaide, School of Dentistry Q: Is the fluoride content in toothpaste X 1100ppm? Ans: Yes or No Ho: µ = 1100 Ha: µ ≠ 1100 x = 1035; varince 200; n = 100 In above example, if we get P=.01, we reject the null hypothesis (Ho), then …… We conclude as Alternative Hypothesis (Ha) … “the mean fluoride content in toothpaste X is different from 1100ppm”. Alternatively, we may report as …… “the mean fluoride content is significantly different from 1100ppm”. Note: (1) The second conclusion is more commonly used in the literature. © The University of Adelaide, School of Dentistry21
22. 22. Q: Is the fluoride content in toothpaste X 1100ppm? Ans: Yes or No Ho: µ = 1100 Ha: µ ≠ 1100 x = 1035; varince 200; n = 100 In above example, if we get P=.08, we CANNOT reject the null hypothesis (Ho), then …… We conclude as Alternative Hypothesis (Ha) … “the mean fluoride content in toothpaste X is NOT different from 1100ppm”. Alternatively, we may report as …… “the mean fluoride content is NOTsignificantly different from 1100ppm”. © The University of Adelaide, School of Dentistry What is P value? Q: Is the mean fluoride content in toothpaste X 1100ppm? Ans: Yes or No Ho: µ = 1100 Ha: µ ≠ 1100 x = 1035 = variance 200; n = 100 If the P value is less than 0.05, we reject the Null Hypothesis. P value is the probability of error if you reject the Null Hypothesis and conclude as the Alternative Hypothesis. Example: P value=0.01. It means that … There is 1% probability of error in our conclusion, if we conclude as Alternative Hypothesis (“significantly different”). We, normally, allow less than 5% error. That is why the cut-off point for P value is 0.05. © The University of Adelaide, School of Dentistry22
23. 23. What is P value? Q: Is the mean fluoride content in toothpaste X 1100ppm? Ans: Yes or No Ho: = 1100 Ha: µ ≠ 1100 µ x = 1035; variance200; n = 100 If the P value is less than 0.05, we reject the Null Hypothesis. P value is the probability of error if you reject the Null Hypothesis and conclude as the Alternative Hypothesis. Example: P value=0.2. It means that … There is 20% probability of error in our conclusion if we conclude as Alternative Hypothesis (“significant difference”). Therefore, we can’t conclude as it is “significantly different”. We have to conclude as “the difference is not significant”. © The University of Adelaide, School of Dentistry What is P value? Q: Is the mean fluoride content in toothpaste X 1100ppm? Ans: Yes or No Ho: = 1100 Ha: µ ≠ 1100 µ x = 1035; variance 200; n = 100 If the P value is less than 0.05, we reject the Null Hypothesis. It means that we have set the cut-off point at P less than 0.05 to reject the Ho. We say this as … We set the “Alpha” at 0.05. Because the type of error that we have been talking about, is called “Type I error” or “Alpha error”. © The University of Adelaide, School of Dentistry23
24. 24. The use of P-values in hypothesis testing Definition The P-value is the smallest level of significance that would lead to rejection of the null hypothesis H0. (The p-value is the observed significance level.) All statistical tests produce a P-value. P-values answers the question: ‘Is there a statistically significant difference between study groups?’ © The University of Adelaide, School of Dentistry P-values Most scientific articles report a P-value associated with a test. Generally, the P-value is compared to a significance level (α) of 0.05 or 0.01 in order to determine whether or not the result is statistically significant. Decision rules: If P-value ≤ α then reject H0 at level α (a statistically significant result). If P-value > α then do not reject H0 at level α (not statistically significant). Example: If P-value<0.05, this indicates that there is a less than 5% chance that the results observed occurred due to chance. We reject H0 and conclude that the result is significant. © The University of Adelaide, School of Dentistry24
25. 25. Example (cont.) So, our hypotheses are: H0: µ = 1,100 HA: µ ≠ 1,100 The P-value for this test was found to be 0.0006. What is your conclusion? Since P-value < 0.05, we reject H0 in favour of HA, i.e. we reject the original assumption that the sample was drawn from a population where µ=1,100 and σ2=200. We say that there is a significant difference between the sample mean and the population mean at the 5% level, i.e. there is a less than 5% chance (or 0.06% chance) that the result observed occurred due to chance. © The University of Adelaide, School of Dentistry Types of error When we sample, we select cases from a population of interest. Due to chance variations in selecting the sample’s few cases from the population’s many possible cases, the sample will deviate from the defined population’s true nature by a certain amount. This is called sampling error. Therefore, inferences from samples to populations are always probabilistic, meaning we can never be 100% certain that our inference was correct. Drawing the wrong conclusion is called an error of inference. There are two types of errors of inference defined in terms of The Universityhypothesis: Type 1 error and Type 2 © the null of Adelaide, School of Dentistry error.25
26. 26. Types of error (cont.) Possibilities related to decisions about H0: Actual situation H0 true H0 false Accept H0 Type II Error (correct decision) Investigator’s decision Type I Error Reject H0 probability= α (correct decision) © The University of Adelaide, School of Dentistry Types of error (cont.) Type 1 and Type 2 errors can be quite difficult to understand, so let’s look at a few examples to help you grasp the concept. Let’s hypothesise that two groups of dental patients are equal in their knowledge of preventive hygiene behaviours. Now consider the following four scenarios. For each, determine whether or not an error has been made and, if so, what type of error. © The University of Adelaide, School of Dentistry26