Ebd1 lecture 6&7 2010

General Studies
Community Dentistry 1
Statistical Inference
Lecture 6
Dr Nizam Abdullah

Contents

Review of descriptive statistics
The normal curve
Introduction to inferential
statistics

© The University of Adelaide, School of Dentistry

1

Descriptive statistics

Central tendency
Mean
Median (50th Percentile)
Mode
Dispersion
Standard deviation (SD) / Variance
Inter-quartile range (IQR) (3rd quartile – 1st
quartile)
Range (Maximum – Minimum)

Distribution of a variable
Another important aspect of the description of a
variable is the shape of its distribution, which tells
you the frequency of values from different ranges
of the variable.
Typically, a researcher is interested in how well
the distribution can be approximated by the normal
distribution.
The normal distribution can be used to determine
how far the sample is likely to be off from the
overall population, i.e. how big a ‘margin of error’
there is likely to be.
Simple descriptive statistics can provide some
information relevant to this issue.

2

Distribution of a variable (cont.)
A variable is said to be a normally distributed
variable or to have a normal distribution if its
distribution has the shape of a normal curve - the
normal curve is a kind of bell-shaped curve.
A normal distribution (and hence a normal curve) is
completely determined by its mean and standard
deviation - the mean and standard deviation are
called the parameters of the normal curve.
The normal curve is symmetric and centered about
the mean.
The standard deviation determines the spread of
the curve. The larger the standard deviation, the
flatter and more spread out the curve will be.

Normal curve (cont.)
The mean, median, and mode all have the same value.


3

Different shapes of the Normal curve

Standard deviation changes the relative width of the
distribution; the larger the standard deviation, the wider
the curve.


Properties of normal distribution
Age distribution of Village A • Bell-shaped curve
45 • Symmetrical about its mean (mirror image
40
35 to each side)
30
25
68% Mean and median are equal.
20
15
50%
95%50%
99.7%
One side of the mean is 50% of the area.
10
5
0
The area between mean-1SD and
mean+1SD is 68% (Mean±1SD=30, 50).
80+
0 -4
5 -9
1 0 -1 4
1 5 -1 9
2 0 -2 4
2 5 -2 9
3 0 -3 4
3 5 -3 9
4 0 -4 4
4 5 -4 9
5 0 -5 4
5 5 -5 9
6 0 -6 4
6 5 -6 9
7 0 -7 4
7 5 -7 9

The area between mean-2SD and
e.g. : (Age) Mean = 40, SD = 10 mean+2SD is 95% (Mean±2SD=20, 60).
Therefore, Mean±1SD = 30, 50
Between 30 yr and 50 yr old, The area between mean-3SD and
there will be 68% of the group. mean+3SD is 99.7% (Mean±3SD=10, 70).

4

Normal curve: 68-95-99.7 rule

68% of the
observations fall within
68%
one standard deviation
of the mean -σ µ +σ

95% of the
two standard deviations 95%

of the mean
-2σ µ +2σ
99.7% of the
three standard 99.7%
deviations of the mean
-3σ µ +3σ

Distributions: Negative
In a negatively skewed distribution, the mode is at the
top of the curve, the median is lower than it, and the
mean is lower than the median.
The result is a ‘tail’ towards the more negative side of
the graph.

Negative skewness
(tail to left = left skewed)
Median < Mode
Mean < Median


5

Distributions: Positive
In a positively skewed distribution, the mode is at the
top of the curve, the median is higher than it, and the
mean is higher than the median.
The result is a ‘tail’ towards the more positive side of
the graph.

Positive skewness :
(tail to right = right skewed)
Median > Mode
Mean > Median


Example dataset
First Year BDS students enrolled in EBD1.
Response to survey: n= 90 (out of 119), or 76%.
Variables:
– Age: quantitative variable measured on a ratio scale
– Sex: qualitative variable measured on a nominal
scale, i.e. variable with categories male or female
– Height: quantitative variable measured on a ratio scale
– Weight: quantitative variable measured on a ratio
scale

Variables measured at a higher level can always be
converted to a lower level, but not vice versa.
For example, observations of actual age (ratio
scale) can be converted to categories of older and
younger (ordinal scale). Similarly for height and
weight.The University of Adelaide, School of Dentistry
©

6

Data spreadsheet
Case Age Sex Height Weight
1 21 2 165 45
2 22 2 170 53
3 18 1. 74
4 20 2 165 44
5 19 1 175 70
6 19 2 163 53
7 24 2 163 49
8 18 1 170 60
9 29 2 178 70
10 28 2 163 58
11 18 2 177 72
12 38 2 164 65
13 23 2 161 65
14 20 2 178 63
15 29 2 159 54
: : : : :
: University of Adelaide, School: of Dentistry :
© The : :

Frequency distribution of height
variable

Height (cm) Frequency
Mode = 165cm,
150-155 4
170cm
155-160 9
Mean = 169.3cm
160-165 21 Median = 168cm
165-170 24

170-175 10

175-180 10

180-185 7

185-190 3

190-195 1

Total 89

150 155 160 165 170 175 180 185 190 195

7

Frequency distribution of weight
variable
Weight (kg) Freq.
40-45 4
45-50 9 Mean = 62.6 kg
50-55 15
Median = 60 kg
55-60 16
60-65 13
65-70 7
70-75 11
75-80 4
80-85 3
85-90 3
90-95 2
95-100 1
100-105 1
105-110 0
110-115 0
115-120 0
120-125 1
40 45 50 55 60 65 70 75 80 85 90 95 100 105 120 125
Total 90

Frequency distribution of age variable

Mode < Median < Mean
18 yrs 19 yrs 20 yrs


8

Descriptive statistics

Variable Freq Min Max Range SD
Age 90 17.0 38.0 21.0 3.4

Height 89 152.0 191.0 39.0 8.8

Weight 90 40.0 120.0 80.0 14.5

Variable Category Freq %
Sex Male 32 35.6

Female 58 64.4

Total 90 100.0


What is Inferential
Statistics ?
It is the Statistical Technique/Method used to
infer the result of the sample (statistic) to the
population (parameter).

Population (Village A)
µ=?

The technique is called
“Inferential Statistics”
Sample
x = 10.14

9

Statistical inference

Inferential statistics are used to draw
inferences about a population from a sample.

For example, the average number of decayed
teeth in children aged 5 years can be
estimated using observations from a sample
of 5-year-olds.


Selecting a sample from a population

How can a sample that is representative of the
population of interest be selected?

Answer: by random selection

When a random sample is drawn from the population
of interest, every member of the population has the
same probability, or chance, of being selected in the
sample.

For this reason, random samples are considered to
be unbiased.


10

Two types of Inferential Statistics

Parameter Estimation

Hypothesis testing


1. Parameter estimation

• Parameter estimation takes two
forms:
• 1. Point estimation
• 2. Interval estimation


11

Definition

• A point estimate is a single numerical
value used to estimate the
corresponding population parameter
• An interval estimate consists of two
numerical values defining a range of
values that, with a specific degree of
confidence, we feel includes the
parameter being estimated

Parameter estimation

Point estimate is when an estimate of the population
parameter is given as a single number, e.g. sample
mean, median, variance, standard deviation.
Interval estimation involves more than one point; it
consists of a range of values within which the
population parameter is thought to be, a confidence
interval which contains the upper and lower limits of
the range of values.
Point and interval estimates let us infer the true
value of an unknown population parameter using
information from a random sample of that
population.


12

Confidence intervals (cont.)
Example
Suppose a paper reports that, among a sample of 2,823
5–6-year-old children living in Sharjah, the mean number of
decayed teeth is 0.81 (SD = 1.66) with a 95% confidence
interval of (0.75, 0.87).

Interpretation
The 95% confidence interval is the range in the mean number
of decayed teeth we would expect in a population of 6-year-old
children living in Sharjah.
Because only a sample of children were used, the exact
population mean cannot be known for certain.
Hence, the 95% confidence interval indicates the margin of
imprecision due to sampling error.
Or, alternatively, you could think of it as the range in which
there is a 95% chance that the true population mean lies.


1. Estimation (CI)
Population
µ=?

CI = x ± {ta/2 * (Standard Error)}

Sample
x = 10.14 95% CI = x ± { t0.025 * ( S.E )}

95% CI = 10.14 ± {1.96 * (0.43)}


13

1. Estimation (CI)
Population
µ=?

CI = x ± {tα/2 * (Standard Error)}
95% CI = x ± { t0.025 * ( S.E )}
Sample 95% CI = 10.14 ± {1.96 * (0.43)}
x = 10.14
95% CI = 10.14 ± 0.8514
s.d = 4.3
n = 100
s.d 95% CI = 9.29, 10.99
S.E =
n
4.3
S.E = = 0.43
100

1. Estimation (CI)
Population
µ=?

95% CI = 9.29, 10.99
Sample
x = 10.14 We are 95% sure that mean of the population will lie
between 9.29 and 10.99.
99% CI = 9.02, 11.26
For 99% replace
1.96 with 2.58

14

95% Confidence interval formula

⎛ Std . Dev ⎞
Estimate ± 1.96 ∗ ⎜ ⎟ Std. error

e.g. Mean
⎝ n ⎠

Standard deviation vs. Standard error of the statistic
These two statistics are used for very different
purposes.
Standard deviation is a measure of spread of a set of
observations.
Standard error measures sampling error and is used
to indicate the precision of a statistic, i.e. how close
the statistic is to of Adelaide, School of Dentistry estimating.
© The University
the parameter it is

Standard error example

⎛ Std . Dev ⎞
0.81 Estimate ± 1.96 ∗ ⎜ ⎟
⎝ n ⎠
1.66
Std Error = = 0.03
Standard error of the mean 2823
In a sample of 2,823 5–6-year-old children living in Sharjah, the
mean number of decayed teeth is 0.81 and std deviation is 1.66.
The standard error is approximately 0.03.
So, we expect, on average, observed sample means of 0.81,
but, when we’re wrong, we expect to be off by about 0.03
points, on average.
Standard error of the sample mean gives an indication of the
extent to which the sample mean deviates from the population
mean.

15

2. Hypothesis Testing


What is hypothesis
testing?
In Estimation, we estimate a population parameter
from a sample statistic

In Hypothesis testing, we answer to a specific
question related to a population parameter


16

Hypothesis testing

• A (statistical) hypothesis is a statement of
belief about population parameters
• It is a predominant feature of quantitative
research in oral health & health care
research in general
• Researchers can test a hypothesis to see
whether the collected data support or
refute such hypothesis


2 types of hypotheses

• The null hypothesis, symbolized by
Ho; proposes no relationship
between 2 variables or no effect in
the population
• The alternative hypothesis,
symbolized by Ha; is a statement
that disagrees with the null
hypothesis.

17

• If the null hypothesis is rejected as a result
of sample evidence, then the alternative
hypothesis is concluded
• If the evidence is insufficient to reject, the
null hypothesis is retained, but not
accepted
• Traditionally researches do not accept the
null hypothesis from current evidence;
they state that it cannot be rejected


Example

A toothpaste company claims that their toothpaste
contains, on average, 1100 ppm of fluoride.
Suppose we are interested in testing this claim. We
will randomly sample 100 tubes (i.e., n=100) of
toothpaste from this company and under identical
conditions calculate the average fluoride content (in
ppm) for this sample.
From the sample of 100 tubes of toothpaste, the
average ppm was found to be 1035 (= X ).
Could this sample have been drawn from a
population with mean fluoride content of µ=1,100
(known variance σ2=200).


18

Basic steps in hypothesis testing
1. Propose a research question (identify the parameter
of interest).
2. State the null hypothesis, H0 and alternative
hypotheses, HA
3. Define a threshold value for declaring a P-value
significant. The threshold is called the significance
level of the test is denoted by alpha (α) and is
commonly set to 0.05.
4. Select the appropriate statistical test to compute the
P-value.


Basic steps in hypothesis testing (cont.)

5. Compare the P-value of your test to the chosen
level of significance. Can the null hypothesis be
rejected?
6. If P-value < α , conclude that the difference is
statistically significant and decide to reject the
null hypothesis.
If P-value ≥ α, conclude that the difference is
not statistically significant and decide not to
reject the null hypothesis.


19

Example

A toothpaste company (X) claims that their
toothpaste contains, on average, 1100 ppm of
fluoride.
What is the research question?

X


What is hypothesis testing?
Research Q: Is the mean fluoride content in
toothpaste X 1100ppm?
Ans: Yes or No
1) Null hypothesis: The mean fluoride content
in toothpaste X is equal to 1100ppm
Ho: µ = 1100

2) Alternative hypothesis : The mean fluoride
content in toothpaste X is not equal to 1100ppm
Ha: µ ≠ 1100


20

Define the p value (commonly
set at 0.05
Select appropriate test to compute the p value

At the end of the hypothesis testing, we
will get a P value.

If the P value is less than 0.05, we
reject the Null Hypothesis (Ho).

If the P value is more than or equal to
0.05, we cannot reject the Null
Hypothesis (Ho).

Q: Is the fluoride content in toothpaste X 1100ppm?

Ans: Yes or No
Ho: µ = 1100 Ha: µ ≠ 1100 x = 1035; varince 200; n = 100
In above example, if we get P=.01, we reject the null hypothesis
(Ho), then ……
We conclude as Alternative Hypothesis (Ha) … “the mean
fluoride content in toothpaste X is different from 1100ppm”.

Alternatively, we may report as ……
“the mean fluoride content is significantly different from
1100ppm”.
Note: (1) The second conclusion is more commonly used in the
literature.


21

Q: Is the fluoride content in toothpaste X 1100ppm?

Ans: Yes or No
Ho: µ = 1100 Ha: µ ≠ 1100 x = 1035; varince 200; n = 100
In above example, if we get P=.08, we CANNOT reject the null
hypothesis (Ho), then ……
We conclude as Alternative Hypothesis (Ha) … “the mean
fluoride content in toothpaste X is NOT different from
1100ppm”.
Alternatively, we may report as ……
“the mean fluoride content is NOTsignificantly different from
1100ppm”.


What is P value?
Q: Is the mean fluoride content in toothpaste X
1100ppm?
Ans: Yes or No
Ho: µ = 1100 Ha: µ ≠ 1100 x = 1035 = variance 200; n = 100
If the P value is less than 0.05, we reject the Null Hypothesis.

P value is the probability of error if you reject the Null
Hypothesis and conclude as the Alternative Hypothesis.

Example: P value=0.01. It means that …
There is 1% probability of error in our conclusion, if we conclude
as Alternative Hypothesis (“significantly different”).

We, normally, allow less than 5% error.
That is why the cut-off point for P value is 0.05.

22

What is P value?
1100ppm?
Ans: Yes or No
Ho: = 1100 Ha: µ ≠ 1100
µ x = 1035; variance200; n = 100

P value is the probability of error if you reject the Null
Hypothesis and conclude as the Alternative Hypothesis.

Example: P value=0.2. It means that …
There is 20% probability of error in our conclusion if we
conclude as Alternative Hypothesis (“significant difference”).

Therefore, we can’t conclude as it is “significantly different”. We
have to conclude as “the difference is not significant”.

What is P value?
1100ppm?
Ans: Yes or No
Ho: = 1100 Ha: µ ≠ 1100
µ x = 1035; variance 200; n = 100

It means that we have set the cut-off point at P less than 0.05 to
reject the Ho.

We say this as …
We set the “Alpha” at 0.05.

Because the type of error that we have been talking about, is
called “Type I error” or “Alpha error”.


23

The use of P-values in hypothesis testing

Definition
The P-value is the smallest level of significance
that would lead to rejection of the null hypothesis
H0. (The p-value is the observed significance level.)
All statistical tests produce a P-value.

P-values answers the question: ‘Is there a
statistically significant difference between study
groups?’


P-values
Most scientific articles report a P-value associated
with a test. Generally, the P-value is compared to a
significance level (α) of 0.05 or 0.01 in order to
determine whether or not the result is statistically
significant.
Decision rules:
If P-value ≤ α then reject H0 at level α (a statistically
significant result).
If P-value > α then do not reject H0 at level α (not
statistically significant).
Example:
If P-value<0.05, this indicates that there is a less than
5% chance that the results observed occurred due to
chance. We reject H0 and conclude that the result is
significant.

24

Example (cont.)
So, our hypotheses are:
H0: µ = 1,100
HA: µ ≠ 1,100
The P-value for this test was found to be 0.0006.

What is your conclusion?
Since P-value < 0.05, we reject H0 in favour of HA, i.e.
we reject the original assumption that the sample was
drawn from a population where µ=1,100 and σ2=200.
We say that there is a significant difference between
the sample mean and the population mean at the 5%
level, i.e. there is a less than 5% chance (or 0.06%
chance) that the result observed occurred due to
chance.

Types of error
When we sample, we select cases from a population
of interest. Due to chance variations in selecting the
sample’s few cases from the population’s many
possible cases, the sample will deviate from the
defined population’s true nature by a certain amount.
This is called sampling error.
Therefore, inferences from samples to populations
are always probabilistic, meaning we can never be
100% certain that our inference was correct.
Drawing the wrong conclusion is called an error of
inference.
There are two types of errors of inference defined in
terms of The Universityhypothesis: Type 1 error and Type 2
©
the null of Adelaide, School of Dentistry
error.

25

Types of error (cont.)
Possibilities related to decisions about H0:
Actual situation

H0 true H0 false

Accept H0 Type II Error
(correct decision)
Investigator’s
decision

Type I Error
Reject H0
probability= α (correct decision)



Type 1 and Type 2 errors can be quite difficult to
understand, so let’s look at a few examples to help
you grasp the concept.
Let’s hypothesise that two groups of dental patients
are equal in their knowledge of preventive hygiene
behaviours.
Now consider the following four scenarios. For
each, determine whether or not an error has been
made and, if so, what type of error.


26

1. You accept the null hypothesis when the groups are
really equal in oral self-care knowledge.
Answer:
2. You reject the null hypothesis when the groups are
Answer:
really different in their oral self-care knowledge.
Answer:
4. Accepts the null hypothesis when one group has
much more oral self-care knowledge than the other.
Answer:

1. You accept the null hypothesis when the groups are
Answer: Correct decision
Answer: Type 1 error
really different in their oral self-care knowledge.
Answer: Correct decision
4. Accepts the null hypothesis when one group has
much more oral self-care knowledge than the other.
Answer: Type 2 error

27

Statistical vs Practical significance

Statistical significance does not necessarily imply that
the true difference in population means is of sufficient
magnitude to be of clinical importance.
Significance tests tell us whether a difference is
statistically significant but significance tests do not tell
us whether the difference is of practical importance.
In clinical practice we usually need to know the
presence and size of any difference.


Statistical vs Practical significance (cont.)

P-values only inform you on the likelihood of a
difference being attributable to chance.
As the sample size increases and the variance
decreases, small differences in mean values may
provide statistically significant results.
Whether these ‘statistically significant’ differences are
of any practical or clinical significance requires
judgement on the part of the clinician.


28

Statistical vs. Practical significance
example
Consider a study comparing a new hypertensive medication (A)
with a standard hypertensive medication (B).
(Suppose drug A has additional side effects and is more expensive
than drug B.)
Results
1.Blood pressures of patients receiving A were significantly lower
than those on B (p-value=0.0001).
2. Difference in blood pressure between the groups was 5mmHg.

Interpretation
1. Probability that the difference found, or bigger, being
attributable to chance is less than 0.01% or 1 in 10,000.
2. But, given the small difference found between the groups,
might consider this difference to be too small to offset the
difficulties, side effects and expense associated with drug A.
3. The effect is smaller than clinically meaningful, so we have
statistical significance but not clinical/practical significance.


29

Ebd1 lecture 6&7 2010

Recommended

Recommended

More Related Content

Similar to Ebd1 lecture 6&7 2010

Similar to Ebd1 lecture 6&7 2010 (20)

More from Reko Kemo

More from Reko Kemo (7)

Recently uploaded

Recently uploaded (20)

Ebd1 lecture 6&7 2010