SlideShare a Scribd company logo
1 of 78
Haileab.F(BSc, MPH)
University of Gondar
College of medicine and health science
Department of Epidemiology and Biostatistics
Statistical Inference
Objectives
 After completing this session you will be able
to do
 Understand basics of statistical inferences
 Apply statistical inference on real data sets
2
Introduction # 1
 Inferential is the process of generalizing or
drawing conclusions about the target
population on the basis of results obtained
from a sample.
3
Introduction #2
 Statistical inference can be either parametric or
non-parametric
 Example: The path way for the analysis of
continuous variables is shown below
4
Introduction #3
 We have two facts that are key to statistical
inference.
• Population parameters are fixed numbers whose values
are usually unknown
• Sample statistics are known values for any given
sample, but vary from sample to sample taken from the
same population.
• This variability of sample statistics(sampling
variation) is always present and must be
accounted for in any inferential procedure by
identifying probability distributions that describe
the variability of sample statistics.
5
Introduction #4
 The frequency distribution of all these samples forms the
sampling distribution of the sample statistic
6
Introduction #5
7
 This sampling distribution has characteristics that can be related to
those of the population from which the sample is drawn.
 This relationship is usually provided by the parameters of the probability
distribution describing the population.
 E.g. Sampling Distribution of the means and proportions
Illustrative examples of Distribution of Sample
Mean
 Consider an experiment consisting of drawing two disks
from five, replacing the first before drawing the second,
and then computing the mean of the values on the two
disks.
8
9
Properties of sampling
Dist….
1. The mean of the sampling distribution of
means is the same as the population mean,  .
2. The SD of the sampling distribution of means
is
 / n (Standard error) .
3. The shape of the sampling distribution of
means is approximately a normal curve,
regardless of the shape of the population
distribution and provided n is large enough
Haileab.f (MPH) 3/1/2023
Assignment
 Other Sampling Distributions
 Eg. T-distribution, chi-square distributions , F
distributions etc.
 Relationships among the Distributions
10
Principles of Inference
 As we have repeatedly noted, one of the primary
objectives of a statistical analysis is to use data
from a sample to make inferences about the
population from which the sample was drawn.
 A statistical inference is composed of two parts:
1. A statement about the value of that parameter, and
2. a measure of the reliability of that statement,
usually expressed as a probability
 Traditionally statistical inference is done with
one of two different but related objectives in
mind.
11
Assumptions
 Two major assumptions are needed to assure
the correctness for statistical inferences:
• randomness of the sample observations, and
• the distribution of the variable(s) being
studied.
12
Principles of Inference …...
13
 tests of hypotheses, in which we hypothesize that one or
more parameters have some specific values or
relationships, and make our decision about the
parameter(s) based on one or more sample statistic. In
this type of inference, the reliability of the decision is the
probability that the decision is incorrect.
 Estimate one or more parameters using sample
statistics. This estimation is usually done in the form of an
interval, and the reliability of this inference is expressed
Estimation
14
 Two methods of estimation are commonly used: point
estimation and interval estimation
1. Point estimation: - A single numerical value
used to estimate the corresponding population
parameter
2. Interval estimation: Is a range (an interval) of
values used to estimate the true values of a
population parameter, with a specified degree of
confidence.
Confidence Interval (CI) estimate of a parameter
CI = Estimator ± (reliability coefficient) x (standard
error)
Haileab.f (MPH)
15
3/1/2023
Confidence Level: the probability 1 – α that is the proportion
of times that the confidence interval actually does contain
the population parameter, assuming that the estimation
process is repeated a large number of times.
 Also written (1 - α) = .95
Definition/Interpretation : 95% CI
1. Probabilistic interpretation:
 If all possible random samples (an infinite number) of a
given sample size (e.g. 10 or 100) were obtained and if
each were used to obtain its own CI, then 95% of all such
CIs would contain the unknown population parameter; the
remaining 5% would not.
 It is incorrect to say “There is a 95% probability that the CI
contains the unknown population parameter”.
16
Haileab.f (MPH) 3/1/2023
2. Practical interpretation
 When sampling is from a normally distributed
population with known standard deviation, we
are 100 (1-α) [e.g., 95%] confident that the
single computed interval contains the unknown
population parameter.
17
Haileab.f (MPH) 3/1/2023
Confidence intervals…
18
 The 95% confidence interval is calculated in such a way
that, under the conditions assumed for underlying
distribution, the interval will contain true population
parameter 95% of the time.
 Loosely speaking, you might interpret a 95% confidence
interval as one which you are 95% confident contains the
true parameter.
 90% CI is narrower than 95% CI since we are only 90%
certain that the interval includes the population parameter.
 On the other hand 99% CI will be wider than 95% CI; the
extra width meaning that we can be more certain that the
interval will contain the population parameter. But to obtain a
higher confidence from the same sample, we must be
willing to accept a larger margin of error (a wider interval).
 As the confidence interval increase =wider
certainty
 99%wider than 95% CI
 95% wider than 90% CI
 The larger the sample size the narrow CI
 More precise our estimate
19
Confidence intervals…
20
 For a given confidence level (i.e. 90%, 95%, 99%)
the width of the confidence interval depends on
the standard error of the estimate which in turn
depends on the
 1. Sample size:-The larger the sample size, the
narrower the confidence interval (this is to mean the
sample statistic will approach the population parameter)
and the more precise our estimate. Lack of precision
means that in repeated sampling the values of the
sample statistic are spread out or scattered. The result
of sampling is not repeatable.
Confidence intervals…
21
- To increase precision (of an SRS), use a larger
sample. You can make the precision as high as
you want by taking a large enough sample. The
margin of error decreases as√n increases.
 2. Standard deviation:-The more the variation
among the individual values, the wider the
confidence interval and the less precise the
estimate. As sample size increases SD
decreases.
 Z is the value from SND
 90% CI, z=1.64
 95% CI, z=1.96
 More variation wider CI
 Less precise
 Increase sample size
 Decrease SD
22
23
Estimation for Single Population
24
Haileab.f (MPH) 3/1/2023
Margin of Error
(Precision of the estimate)
25
Haileab.f (MPH) 3/1/2023
Example:
1. Waiting times (in hours) at a particular
hospital are believed to be approximately
normally distributed with a variance of 2.25
hr.
a. A sample of 20 outpatients revealed a mean
waiting time of 1.52 hours. Construct the 95%
CI for the estimate of the population mean.
b. Suppose that the mean of 1.52 hours had
resulted from a sample of 32 patients. Find the
95% CI.
c. What effect does larger sample size have on
the CI?
26
Haileab.f (MPH) 3/1/2023
a.
)
17
.
2
,
87
(.
65
.
52
.
1
)
33
(.
96
.
1
52
.
1
20
25
.
2
96
.
1
52
.
1






•  = standard deviation= square root of sd
•We are 95% confident that the true mean waiting time is between
0.87 and 2.17 hrs.
• Although the true mean may or may not be in this interval, 95%
of the intervals formed in this manner will contain the true mean.
• An incorrect interpretation is that there is 95% probability that
this
interval contains the true population mean.
27
Haileab.f (MPH) 3/1/2023
B. Unknown variance
(small sample size, n ≤ 30)
 What if the  for the underlying population is
unknown and the sample size is small?
 As an alternative we use Student’s t
distribution.
28
Haileab.f (MPH) 3/1/2023
Example
29
Haileab.f (MPH)b 3/1/2023
t-value at 90% CL at 19 df =1.729
30
Haileab.f (MPH) 3/1/2023
 Xbar +CC*SE
 Confident coefficient =t tab for CI is given
 SE/stundared error =√varianc/n OR SD/√n
31
2. CIs for single population
proportion, p
 Is based on three elements of CI.
 Point estimate
 SE of point estimate
 Confidence coefficient
32
Haileab.f (MPH) 3/1/2023
33
Haileab.f (MPH) 3/1/2023
Example 1
 A random sample of 100 people shows that 25
are left-handed. Form a 95% CI for the true
proportion of left-handers.
34
Haileab.f (MPH) 3/1/2023
Interpretation
35
Haileab.f (MPH) 3/1/2023
Hypothesis testing
 A hypothesis usually results from speculation concerning
observed behavior, natural phenomena, or established
theory.
 If the hypothesis is stated in terms of population
parameters such as the mean and variance, the
hypothesis is called a statistical hypothesis where as
sample is called a test of the hypothesis
36
type of Hypotheses
37
 Null hypothesis (represented by HO) is the statement about the value of
one or more population parameter. That is the null hypothesis postulates
that ‘there is no difference between factor and outcome’ or ‘there is no an
intervention effect’.
 Alternative hypothesis (represented by HA) states the ‘opposing’ view
that ‘there is a difference between factor and outcome’ or ‘there is an
intervention effect’.
 This hypothesis is declared to be accepted if the null hypothesis is
rejected.
Steps in Hypothesis Testing
1. Formulate the appropriate statistical hypotheses clearly
• Specify HO and HA
H0:  = 0 H0:  ≤ 0 H0:  ≥ 0
H1:   0 H1:  > 0 H1:  < 0
two-tailed one-tailed one-tailed
2. State the assumptions necessary for computing
probabilities
• A distribution is approximately normal (Gaussian)
• Variance is known or unknown
3. Select a sample and collect data
• Categorical, continuous
38
Haileab.f (MPH) 3/1/2023
4. Decide on the appropriate test statistic for the
hypothesis. E.g., One population
5. Specify the desired level of significance for the
statistical test (=0.05, 0.01, etc.)
OR
39
Haileab.f (MPH) 3/1/2023
6. Determine the critical value.
 A value the test statistic must attain to be declared significant.
(Two tailed ɑ = 5%) One tailed , >( ɑ = 10% ) One tailed , < (ɑ =10%)
7. Obtain sample evidence and compute the test statistic
8. Reach a decision and draw the conclusion
• If Ho is rejected, we conclude that HA is true (or accepted).
• If Ho is not rejected, we conclude that Ho may be true.
-1.96 1.96 1.645 -1.645
40
Haileab.f (MPH) 3/1/2023
Types of Errors in Hypothesis
Tests
 Whenever we reject or accept the Ho, we
commit errors.
 Two types of errors are committed.
 Type I Error
 Type II Error
41
Haileab.f (MPH) 3/1/2023
Rule of decision making
42
 The rejection or critical region is the range of values of a
sample statistic that will lead to rejection of the null
hypothesis
 Obviously we cannot make both types of errors
simultaneously, and in fact we may not make either, but the
possibility does exist.
 In fact, we will usually never know whether any error has been
committed. The only way to avoid any chance of error is not to make a
Type of decision H0 true H0 false
Reject H0 Type I error (a)
Correct decision (1-
β)
Accept H0
Correct decision (1-
a)
Type II error (β)
Test Statistics
43
 A test statistics is a value we can compare with
known distribution of what we expect when the null
hypothesis is true.
 The general formula of the test statistics is:
Observed _ Hypothesized
 Test statistics = value value .
Standard error
The P- Value
44
 In most applications, the outcome of performing a
hypothesis test is to produce a p-value.
 P-value is the probability that the observed difference is due
to chance.
 A large p-value implies that the probability of the value
observed, occurring just by chance is low, when the null
hypothesis is true.
 That is, a small p-value suggests that there might be
sufficient evidence for rejecting the null hypothesis.
 The p value is defined as the probability of observing the
computed significance test value or a larger one, if the H0
hypothesis is true. For example, P[ Z >=Zcal/H0 true].
P-value……
 A p-value is the probability of getting the observed
difference, or one more extreme, in the sample purely
by chance from a population where the true difference is
zero.
 An “empirical” significance level or indicator of
the weight of evidence against the null
hypothesis.
45
How to calculate P-value
o Use statistical software like SPSS, SAS……..
o Hand calculations
—obtained the test statistics (Z Calculated or t-
calculated)
—find the probability of test statistics from
standard normal table
—subtract the probability from 0.5
—the result is P-value
Note if the test two tailed multiply 2 the result.
The P- Value …..
47
 But for what values of p-value should we reject the null
hypothesis?
 By convention, a p-value of 0.05 or smaller is
considered sufficient evidence for rejecting the null
hypothesis.
 By using p-value of 0.05, we are allowing a 5%
chance of wrongly rejecting the null hypothesis
when it is in fact true.
 When the p-value is less than to 0.05, we often say that
the result is statistically significant.
Hypothesis testing for single population
mean
48
 EXAMPLE 1: A researcher claims that the mean of the IQ
for university students is 100 with a standard deviation of 10
and the expected value for a sample of 16 students is 110.
Test the hypothesis .
 Solution
1. Ho:µ=100 VS HA:µ≠100
2. Assume α=0.05
3. Test statistics: z=(110-100)10/4=10/1/10/4
4. z-critical at 0,025 is equal to 1.96.
5. Decision: reject the null hypothesis since 4 ≥ 1.96
6. Conclusion: the mean of the IQ for all population is different
Hypothesis testing for single proportions
49
 Example: In the study of childhood abuse in psychiatry patients, brown
found that 166 in a sample of 947 patients reported histories of physical
or sexual abuse.
a) constructs 95% confidence interval
b) test the hypothesis that the true population proportion is
30%?
 Solution (a)
 The 95% CI for P is given by
]
2
.
0
;
151
.
0
[
0124
.
0
96
.
1
175
.
0
947
825
.
0
175
.
0
96
.
1
175
.
0
)
1
(
2











n
p
p
z
p 
Example……
50
 To the hypothesis we need to follow the steps
Step 1: State the hypothesis
Ho: P=Po=0.3
Ha: P≠Po ≠0.3
Step 2: Fix the level of significant (α=0.05)
Step 3: Compute the calculated and tabulated value of the test statistic
96
.
1
39
.
8
0149
.
0
125
.
0
947
)
7
.
0
(
3
.
0
3
.
0
175
.
0
)
1
(











tab
cal
z
n
p
p
Po
p
z
Example……
51
 Step 4: Comparison of the calculated and tabulated values of
the test statistic
 Since the tabulated value is smaller than the calculated
value of the test the we reject the null hypothesis.
 Step 6: Conclusion
 Hence we concluded that the proportion of childhood abuse
in psychiatry patients is different from 0.3
 If the sample size is small (if np<5 and n(1-p)<5) then use
student’s t- statistic for the tabulated value of the test statistic.
Two sample mean and
proportion
52
 Still now we have seen estimate for only single mean and
single proportion. However it is possible to compute point
and interval estimation for the difference of two sample
means.
 let x1, x2, …, xn1 are samples from the first population and
y1, y2, …, yn2 be sample from the second population.
 Sample mean for the first population be
 Sample mean for the second population
 Then the point estimate for the difference of means (µ1-µ2)
is given by
)
( Y
X 
Y
X
Two sample estimation
53
 A (1-α)100% confidence interval for the
difference of means is given If are
known
2
2
2
1
2
1
2
)
(
n
n
z
y
x


 


2
1, 
 and
Hypothesis testing for two sample means
54
 The steps to test the hypothesis for difference of means is
the same with the single mean
Step 1: state the hypothesis
Ho: µ1-µ2 =0
VS
HA: µ1-µ2 ≠0, HA: µ1-µ2 <0, HA: µ1-µ2 >0
Step 2: Significance level (α)
Step 3: Test statistic
2
2
2
1
2
1
2
1 )
(
)
(
n
n
y
x
zcal









Example
55
 A researchers wish to know if the data they have collected
provide sufficient evidence to indicate a difference in mean
serum uric acid levels between normal individual and
individual with down’s syndrome. The data consists of serum
uric acid readings on 12 individuals with down’s syndrome
and 15 normal individuals. The means are 4.5mg/100ml and
3.4 mg/100ml with standard deviation for the population to be
2.9 and 3.5 mg/100ml respectively.
0
:
0
:
2
1
2
1








A
O
H
H
SOLUTION
56
96
.
1
33
.
5
23
.
1
6
.
1
5178
.
1
6
.
1
15
5
.
3
12
9
.
2
0
)
4
.
3
3
.
4
(
)
(
)
(
025
.
0
2
2
2
2
2
2
1
2
1
2
1














z
z
n
n
y
x
zcal





Estimation and hypothesis testing for two population
proportion
57
 Let n1 and n2 be the sample size from the two population. If x
and y are the out come of interest then the point estimate for
each population is given by p1=x/n1 and p2=y/n2 respectively.
 The point estimates π1-π2 =p1-p2
 The interval estimate for the difference of proportions is
given by
 If the sample size is large and n1p1>5, n1 (1-p1)>5, n2p2>5,
then







 




2
2
2
1
1
1
2
2
1
)
1
(
)
1
(
n
p
p
n
p
p
z
p
p 
Hypothesis testing for two proportions
58
 To test the hypothesis
Ho: π1-π2 =0
VS
HA: π1-π2 ≠0
The test statistic is given by
2
2
2
1
1
1
2
1
2
1
)
1
(
)
1
(
)
(
)
(
n
p
p
n
p
p
p
p
zcal









SUMMARY
59
Summary
 Students sometimes have difficulty deciding whether to
use Za/2 or t a/2 values when finding confidence
intervals
60
t-test
One sample t-test:
 It is used to compare the estimate of a sample with a
hypothesized population mean to see if the sample
is significantly different.
 Assumptions which should be fulfilled before we use
this method:
 The dependent variable is normally distributed within the
population
 The data are independent (scores of one participant are not
dependent on scores of the other)
T-test cont…
 Hypothesis: Ho: μ = μo Vs HA: μ≠ μo ,
Where μo is the hypothesized mean value
The test statistics is : tcalc = (x
̄ – μ)/(s/√n)
 We compare the calculated test statistics (tcalc) with the
tabulated value (ttab) at n-1 degree of freedom
No Distance
in miles
Drug use No Distance
in miles
Drug
use
1 14.5 no 10 18.4 yes
2 13.4 no 11 16.9 yes
3 14.8 yes 12 12.6 not
4 19.5 yes 13 13.4 not
5 14.5 no 14 16.3 yes
6 18.2 yes 15 17.1 yes
7 16.3 no 16 11.8 not
8 14.8 no 17 13.3 yes
9 20.3 yes 18 14.5 not
Mean 15.59
Standard
deviation
2.43
T-test cont…
E.g. Data: The distance covered by marathon runners until a physiological
stress develops and whether they used drug or not
T-test cont..
It is believed that the mean distance covered
before feeling physiological stress is 15 miles
Hypotheses: Ho: = μ = 15 versus HA: μ ≠ 15
Level of significance: α = 5%
= 15.59, S = 2.43,
tcalc = (x – μ)/(s/√n) (15.59-15)/.57
= 1.03, and P-value = 0.318
At 17 degree of freedom and α = 0.05, ttab = 2.110,
Since tcal = 1.03 < 2.110 = ttab, or α = 0.05 < 0.318 =p-value
we fail to reject Ho
x
̄
Two sample t- test
 A t-distribution can be used for testing hypotheses
about differences of means for independent samples if
both populations are normal and have the same
variances.
 Assesses whether the means of two samples are
statistically different from each other. This analysis is
appropriate whenever you want to compare the means
of two samples/ conditions
 Assumptions of a t-test:
 from a parametric population
 not (seriously) skewed
 no outliers
right hemisphere
Left hemisphere
lesion site
12
10
8
6
95%
CI
infer
comp
t-tests….
 Compare the mean between 2 samples/ conditions
 if 2 samples are taken from the same population,
then they should have fairly similar means
 if 2 means are statistically different, then the
samples are likely to be drawn from 2 different
populations, ie they really are different
Exp. 1 Exp. 2
T-test cont..
b. Paired t- test
 Each observation in one sample has one and only
one mate in the other sample dependent to each
other.
 For example, the independent variable can be
measurements like:
before and after (e.g before and after an intervention),
or repeated measurement (e.g. using digital and
analog apparatus), or when the two data sources are
dependent (e.g. data from mother and father of
respondent)
Hypothesis: Ho: μd = 0 Vs HA: μd ≠ 0
T-test cont..
Subject BP before BP after Difference (di)
1 130 110 -20
2 125 130 +5
3 140 120 -20
4 150 130 -20
5 120 110 -10
6 130 130 0
7 120 115 -5
8 135 130 -5
9 140 130 -10
10 130 120 -10
d (Average of d) -9.5
Sd (Standard deviation of d) 8.64
Example : The blood pressure (BP) of 10 mothers were
measured before and after taking a new drug.
T-test cont..
Hypothesis: Ho: μd = 0 Versus HA: μd ≠ 0
Set the level of significance or α = 0.05
d = -9.5, Sd = 8.64, n = 10,.
tcalc = (d – μd)/(sd/√n) = 3.48 and p-value = 0.0075,
t-tab< t-cal OR t-cal >t-tab =REJECT Ho
At n-1 = 9 df and α = 0.05, ttab = 2.26
Since ttab = 2.26 < 3.48 = tcalc or p-value = 0.0075 < 0.05 = α
We reject Ho
T-test cont..
c.
c. Two independent samples t-test
 Used to compare two unrelated or independent groups
 Assumptions include:
 The variance of the dependent variable in the two
populations are equal
 The dependent variable is normally distributed within
each population
 The data are independent (scores of one participant
are not related systematically to the scores of the
others)
 Hypothesis: Ho: μt = μc Vs HA: μt ≠ μc ,
Where μt and μc are the population mean of treatment
and control (placebo) groups respectively.
71
The test statistics for two sample T-test cont….
 There are three cases which depend on what is known
about the population variances.
Case1:
 Population variances are known for normal
populations (or non normal populations with both
and large). In this case the test statistic is to be :
1
n
2
n
2
2
2
1
2
1
2
1
n
n
X
X
Z





2 2
1 2
and
 
72
Case2:
 Populations are unknown but are to be equal
in normal populations. In this case, we pool our
estimates to get the pooled two- sample variance
 For unknown distribution
 And the test statistic is to be
 Which has a distribution if is true.
2
2
2
2
1


 

2
2
1
2
2
)
1
2
(
2
1
)
1
1
(
2






n
n
S
n
S
n
p
S
)
2
1
1
1
(
2
2
1
n
n
p
S
X
X
T



2
1 2
t n n
 
0
H
73
 Case 3:
 and are unknown and unequal
normal populations . In this case the test
statistic is given by:
which does have a known distribution. If both n1and n2
are large (both over 30) we can assume a normal
distribution
1
2

2
2

2
2
2
1
2
1
2
1
n
S
n
S
X
X
T




Example
Do the marathon runners grouped by their drug intake status
differ in their average distance coverage before they feel
any physiological stress?
Hypothesis: Ho: μt = μc Vs HA: μt ≠ μc, where μt and μc are
for drug users and non-users respectively
Set the level of significance, α = 5%,
xc = 13.98, sc = 1.33, xt = 17.20, st= 2.21
tcalc = (xc – xt)/√S2(nc + nt) = -3.741, and its p-value = 0.002
S2 = is the pooled (combined) variance of both groups.
At 16 df and α = 0.05, ttab = -2.12
Since tcal= -3.741 < -2.12, or P-Value = 0.002 < 0.05 = α
We reject Ho
T-test cont…
 Here in the case of two independent sample t-test,
we have one continuous dependent variable
(interval/ratio data) and;
 one nominal or ordinal independent variable with
only two categories
 In this last case (i.e. two
independent sample t-test), what
if there are more than two
categories for the independent
variable we have?
Inferences for Two or More Means
 Are the birth weights of children in different
geographical regions the same?
 Are the responses of patients to different medications
and placebo different?
 Are people with different age groups have different
proportion of body fat?
 Do people from different ethnicity have the same BMI?
Assignment
 Prevention of Violations of assumption
 Detection of Violations of assumptions
 goodness-of-fit tests
77
Thank you!
78

More Related Content

Similar to Lecture-3 inferential stastistics.ppt

Review & Hypothesis Testing
Review & Hypothesis TestingReview & Hypothesis Testing
Review & Hypothesis TestingSr Edith Bogue
 
Statistical Analysis-Confidence Interval_Session 5.pptx
Statistical Analysis-Confidence Interval_Session 5.pptxStatistical Analysis-Confidence Interval_Session 5.pptx
Statistical Analysis-Confidence Interval_Session 5.pptxmaruco1
 
Inferential statistics
Inferential statisticsInferential statistics
Inferential statisticsMaria Theresa
 
Statistical inference with Python
Statistical inference with PythonStatistical inference with Python
Statistical inference with PythonJohnson Ubah
 
ch 9 Confidence interval.doc
ch 9 Confidence interval.docch 9 Confidence interval.doc
ch 9 Confidence interval.docAbedurRahman5
 
5_lectureslides.pptx
5_lectureslides.pptx5_lectureslides.pptx
5_lectureslides.pptxsuchita74
 
Business statistics-i-part2-aarhus-bss
Business statistics-i-part2-aarhus-bssBusiness statistics-i-part2-aarhus-bss
Business statistics-i-part2-aarhus-bssAntonio Rivero Ostoic
 
Lesson04_Static11
Lesson04_Static11Lesson04_Static11
Lesson04_Static11thangv
 
Lesson04_new
Lesson04_newLesson04_new
Lesson04_newshengvn
 
Confidence Intervals in the Life Sciences PresentationNamesS.docx
Confidence Intervals in the Life Sciences PresentationNamesS.docxConfidence Intervals in the Life Sciences PresentationNamesS.docx
Confidence Intervals in the Life Sciences PresentationNamesS.docxmaxinesmith73660
 
Lecture 6 Point and Interval Estimation.pptx
Lecture 6 Point and Interval Estimation.pptxLecture 6 Point and Interval Estimation.pptx
Lecture 6 Point and Interval Estimation.pptxshakirRahman10
 
Estimation&amp;ci (assignebt )
Estimation&amp;ci (assignebt )Estimation&amp;ci (assignebt )
Estimation&amp;ci (assignebt )Mmedsc Hahm
 
Estimation in statistics
Estimation in statisticsEstimation in statistics
Estimation in statisticsRabea Jamal
 
Bca admission in india
Bca admission in indiaBca admission in india
Bca admission in indiaEdhole.com
 
Chapter 3 Confidence Interval
Chapter 3 Confidence IntervalChapter 3 Confidence Interval
Chapter 3 Confidence Intervalghalan
 
Soni_Biostatistics.ppt
Soni_Biostatistics.pptSoni_Biostatistics.ppt
Soni_Biostatistics.pptOgunsina1
 
Chapter 3 Confidence Interval Revby Rao
Chapter 3 Confidence Interval Revby RaoChapter 3 Confidence Interval Revby Rao
Chapter 3 Confidence Interval Revby RaoSumit Prajapati
 

Similar to Lecture-3 inferential stastistics.ppt (20)

Review & Hypothesis Testing
Review & Hypothesis TestingReview & Hypothesis Testing
Review & Hypothesis Testing
 
Presentation1
Presentation1Presentation1
Presentation1
 
Statistical Analysis-Confidence Interval_Session 5.pptx
Statistical Analysis-Confidence Interval_Session 5.pptxStatistical Analysis-Confidence Interval_Session 5.pptx
Statistical Analysis-Confidence Interval_Session 5.pptx
 
Inferential statistics
Inferential statisticsInferential statistics
Inferential statistics
 
Statistical inference with Python
Statistical inference with PythonStatistical inference with Python
Statistical inference with Python
 
ch 9 Confidence interval.doc
ch 9 Confidence interval.docch 9 Confidence interval.doc
ch 9 Confidence interval.doc
 
5_lectureslides.pptx
5_lectureslides.pptx5_lectureslides.pptx
5_lectureslides.pptx
 
Business statistics-i-part2-aarhus-bss
Business statistics-i-part2-aarhus-bssBusiness statistics-i-part2-aarhus-bss
Business statistics-i-part2-aarhus-bss
 
Lesson04_Static11
Lesson04_Static11Lesson04_Static11
Lesson04_Static11
 
Lesson04_new
Lesson04_newLesson04_new
Lesson04_new
 
Confidence Intervals in the Life Sciences PresentationNamesS.docx
Confidence Intervals in the Life Sciences PresentationNamesS.docxConfidence Intervals in the Life Sciences PresentationNamesS.docx
Confidence Intervals in the Life Sciences PresentationNamesS.docx
 
Sample size determination
Sample size determinationSample size determination
Sample size determination
 
Lecture 6 Point and Interval Estimation.pptx
Lecture 6 Point and Interval Estimation.pptxLecture 6 Point and Interval Estimation.pptx
Lecture 6 Point and Interval Estimation.pptx
 
Estimation&amp;ci (assignebt )
Estimation&amp;ci (assignebt )Estimation&amp;ci (assignebt )
Estimation&amp;ci (assignebt )
 
Chapter09
Chapter09Chapter09
Chapter09
 
Estimation in statistics
Estimation in statisticsEstimation in statistics
Estimation in statistics
 
Bca admission in india
Bca admission in indiaBca admission in india
Bca admission in india
 
Chapter 3 Confidence Interval
Chapter 3 Confidence IntervalChapter 3 Confidence Interval
Chapter 3 Confidence Interval
 
Soni_Biostatistics.ppt
Soni_Biostatistics.pptSoni_Biostatistics.ppt
Soni_Biostatistics.ppt
 
Chapter 3 Confidence Interval Revby Rao
Chapter 3 Confidence Interval Revby RaoChapter 3 Confidence Interval Revby Rao
Chapter 3 Confidence Interval Revby Rao
 

Recently uploaded

Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service LucknowAminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknowmakika9823
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationBoston Institute of Analytics
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxTanveerAhmed817946
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 

Recently uploaded (20)

Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service LucknowAminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project Presentation
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptx
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 

Lecture-3 inferential stastistics.ppt

  • 1. Haileab.F(BSc, MPH) University of Gondar College of medicine and health science Department of Epidemiology and Biostatistics Statistical Inference
  • 2. Objectives  After completing this session you will be able to do  Understand basics of statistical inferences  Apply statistical inference on real data sets 2
  • 3. Introduction # 1  Inferential is the process of generalizing or drawing conclusions about the target population on the basis of results obtained from a sample. 3
  • 4. Introduction #2  Statistical inference can be either parametric or non-parametric  Example: The path way for the analysis of continuous variables is shown below 4
  • 5. Introduction #3  We have two facts that are key to statistical inference. • Population parameters are fixed numbers whose values are usually unknown • Sample statistics are known values for any given sample, but vary from sample to sample taken from the same population. • This variability of sample statistics(sampling variation) is always present and must be accounted for in any inferential procedure by identifying probability distributions that describe the variability of sample statistics. 5
  • 6. Introduction #4  The frequency distribution of all these samples forms the sampling distribution of the sample statistic 6
  • 7. Introduction #5 7  This sampling distribution has characteristics that can be related to those of the population from which the sample is drawn.  This relationship is usually provided by the parameters of the probability distribution describing the population.  E.g. Sampling Distribution of the means and proportions
  • 8. Illustrative examples of Distribution of Sample Mean  Consider an experiment consisting of drawing two disks from five, replacing the first before drawing the second, and then computing the mean of the values on the two disks. 8
  • 9. 9 Properties of sampling Dist…. 1. The mean of the sampling distribution of means is the same as the population mean,  . 2. The SD of the sampling distribution of means is  / n (Standard error) . 3. The shape of the sampling distribution of means is approximately a normal curve, regardless of the shape of the population distribution and provided n is large enough Haileab.f (MPH) 3/1/2023
  • 10. Assignment  Other Sampling Distributions  Eg. T-distribution, chi-square distributions , F distributions etc.  Relationships among the Distributions 10
  • 11. Principles of Inference  As we have repeatedly noted, one of the primary objectives of a statistical analysis is to use data from a sample to make inferences about the population from which the sample was drawn.  A statistical inference is composed of two parts: 1. A statement about the value of that parameter, and 2. a measure of the reliability of that statement, usually expressed as a probability  Traditionally statistical inference is done with one of two different but related objectives in mind. 11
  • 12. Assumptions  Two major assumptions are needed to assure the correctness for statistical inferences: • randomness of the sample observations, and • the distribution of the variable(s) being studied. 12
  • 13. Principles of Inference …... 13  tests of hypotheses, in which we hypothesize that one or more parameters have some specific values or relationships, and make our decision about the parameter(s) based on one or more sample statistic. In this type of inference, the reliability of the decision is the probability that the decision is incorrect.  Estimate one or more parameters using sample statistics. This estimation is usually done in the form of an interval, and the reliability of this inference is expressed
  • 14. Estimation 14  Two methods of estimation are commonly used: point estimation and interval estimation 1. Point estimation: - A single numerical value used to estimate the corresponding population parameter
  • 15. 2. Interval estimation: Is a range (an interval) of values used to estimate the true values of a population parameter, with a specified degree of confidence. Confidence Interval (CI) estimate of a parameter CI = Estimator ± (reliability coefficient) x (standard error) Haileab.f (MPH) 15 3/1/2023
  • 16. Confidence Level: the probability 1 – α that is the proportion of times that the confidence interval actually does contain the population parameter, assuming that the estimation process is repeated a large number of times.  Also written (1 - α) = .95 Definition/Interpretation : 95% CI 1. Probabilistic interpretation:  If all possible random samples (an infinite number) of a given sample size (e.g. 10 or 100) were obtained and if each were used to obtain its own CI, then 95% of all such CIs would contain the unknown population parameter; the remaining 5% would not.  It is incorrect to say “There is a 95% probability that the CI contains the unknown population parameter”. 16 Haileab.f (MPH) 3/1/2023
  • 17. 2. Practical interpretation  When sampling is from a normally distributed population with known standard deviation, we are 100 (1-α) [e.g., 95%] confident that the single computed interval contains the unknown population parameter. 17 Haileab.f (MPH) 3/1/2023
  • 18. Confidence intervals… 18  The 95% confidence interval is calculated in such a way that, under the conditions assumed for underlying distribution, the interval will contain true population parameter 95% of the time.  Loosely speaking, you might interpret a 95% confidence interval as one which you are 95% confident contains the true parameter.  90% CI is narrower than 95% CI since we are only 90% certain that the interval includes the population parameter.  On the other hand 99% CI will be wider than 95% CI; the extra width meaning that we can be more certain that the interval will contain the population parameter. But to obtain a higher confidence from the same sample, we must be willing to accept a larger margin of error (a wider interval).
  • 19.  As the confidence interval increase =wider certainty  99%wider than 95% CI  95% wider than 90% CI  The larger the sample size the narrow CI  More precise our estimate 19
  • 20. Confidence intervals… 20  For a given confidence level (i.e. 90%, 95%, 99%) the width of the confidence interval depends on the standard error of the estimate which in turn depends on the  1. Sample size:-The larger the sample size, the narrower the confidence interval (this is to mean the sample statistic will approach the population parameter) and the more precise our estimate. Lack of precision means that in repeated sampling the values of the sample statistic are spread out or scattered. The result of sampling is not repeatable.
  • 21. Confidence intervals… 21 - To increase precision (of an SRS), use a larger sample. You can make the precision as high as you want by taking a large enough sample. The margin of error decreases as√n increases.  2. Standard deviation:-The more the variation among the individual values, the wider the confidence interval and the less precise the estimate. As sample size increases SD decreases.  Z is the value from SND  90% CI, z=1.64  95% CI, z=1.96
  • 22.  More variation wider CI  Less precise  Increase sample size  Decrease SD 22
  • 23. 23
  • 24. Estimation for Single Population 24 Haileab.f (MPH) 3/1/2023
  • 25. Margin of Error (Precision of the estimate) 25 Haileab.f (MPH) 3/1/2023
  • 26. Example: 1. Waiting times (in hours) at a particular hospital are believed to be approximately normally distributed with a variance of 2.25 hr. a. A sample of 20 outpatients revealed a mean waiting time of 1.52 hours. Construct the 95% CI for the estimate of the population mean. b. Suppose that the mean of 1.52 hours had resulted from a sample of 32 patients. Find the 95% CI. c. What effect does larger sample size have on the CI? 26 Haileab.f (MPH) 3/1/2023
  • 27. a. ) 17 . 2 , 87 (. 65 . 52 . 1 ) 33 (. 96 . 1 52 . 1 20 25 . 2 96 . 1 52 . 1       •  = standard deviation= square root of sd •We are 95% confident that the true mean waiting time is between 0.87 and 2.17 hrs. • Although the true mean may or may not be in this interval, 95% of the intervals formed in this manner will contain the true mean. • An incorrect interpretation is that there is 95% probability that this interval contains the true population mean. 27 Haileab.f (MPH) 3/1/2023
  • 28. B. Unknown variance (small sample size, n ≤ 30)  What if the  for the underlying population is unknown and the sample size is small?  As an alternative we use Student’s t distribution. 28 Haileab.f (MPH) 3/1/2023
  • 29. Example 29 Haileab.f (MPH)b 3/1/2023 t-value at 90% CL at 19 df =1.729
  • 31.  Xbar +CC*SE  Confident coefficient =t tab for CI is given  SE/stundared error =√varianc/n OR SD/√n 31
  • 32. 2. CIs for single population proportion, p  Is based on three elements of CI.  Point estimate  SE of point estimate  Confidence coefficient 32 Haileab.f (MPH) 3/1/2023
  • 34. Example 1  A random sample of 100 people shows that 25 are left-handed. Form a 95% CI for the true proportion of left-handers. 34 Haileab.f (MPH) 3/1/2023
  • 36. Hypothesis testing  A hypothesis usually results from speculation concerning observed behavior, natural phenomena, or established theory.  If the hypothesis is stated in terms of population parameters such as the mean and variance, the hypothesis is called a statistical hypothesis where as sample is called a test of the hypothesis 36
  • 37. type of Hypotheses 37  Null hypothesis (represented by HO) is the statement about the value of one or more population parameter. That is the null hypothesis postulates that ‘there is no difference between factor and outcome’ or ‘there is no an intervention effect’.  Alternative hypothesis (represented by HA) states the ‘opposing’ view that ‘there is a difference between factor and outcome’ or ‘there is an intervention effect’.  This hypothesis is declared to be accepted if the null hypothesis is rejected.
  • 38. Steps in Hypothesis Testing 1. Formulate the appropriate statistical hypotheses clearly • Specify HO and HA H0:  = 0 H0:  ≤ 0 H0:  ≥ 0 H1:   0 H1:  > 0 H1:  < 0 two-tailed one-tailed one-tailed 2. State the assumptions necessary for computing probabilities • A distribution is approximately normal (Gaussian) • Variance is known or unknown 3. Select a sample and collect data • Categorical, continuous 38 Haileab.f (MPH) 3/1/2023
  • 39. 4. Decide on the appropriate test statistic for the hypothesis. E.g., One population 5. Specify the desired level of significance for the statistical test (=0.05, 0.01, etc.) OR 39 Haileab.f (MPH) 3/1/2023
  • 40. 6. Determine the critical value.  A value the test statistic must attain to be declared significant. (Two tailed ɑ = 5%) One tailed , >( ɑ = 10% ) One tailed , < (ɑ =10%) 7. Obtain sample evidence and compute the test statistic 8. Reach a decision and draw the conclusion • If Ho is rejected, we conclude that HA is true (or accepted). • If Ho is not rejected, we conclude that Ho may be true. -1.96 1.96 1.645 -1.645 40 Haileab.f (MPH) 3/1/2023
  • 41. Types of Errors in Hypothesis Tests  Whenever we reject or accept the Ho, we commit errors.  Two types of errors are committed.  Type I Error  Type II Error 41 Haileab.f (MPH) 3/1/2023
  • 42. Rule of decision making 42  The rejection or critical region is the range of values of a sample statistic that will lead to rejection of the null hypothesis  Obviously we cannot make both types of errors simultaneously, and in fact we may not make either, but the possibility does exist.  In fact, we will usually never know whether any error has been committed. The only way to avoid any chance of error is not to make a Type of decision H0 true H0 false Reject H0 Type I error (a) Correct decision (1- β) Accept H0 Correct decision (1- a) Type II error (β)
  • 43. Test Statistics 43  A test statistics is a value we can compare with known distribution of what we expect when the null hypothesis is true.  The general formula of the test statistics is: Observed _ Hypothesized  Test statistics = value value . Standard error
  • 44. The P- Value 44  In most applications, the outcome of performing a hypothesis test is to produce a p-value.  P-value is the probability that the observed difference is due to chance.  A large p-value implies that the probability of the value observed, occurring just by chance is low, when the null hypothesis is true.  That is, a small p-value suggests that there might be sufficient evidence for rejecting the null hypothesis.  The p value is defined as the probability of observing the computed significance test value or a larger one, if the H0 hypothesis is true. For example, P[ Z >=Zcal/H0 true].
  • 45. P-value……  A p-value is the probability of getting the observed difference, or one more extreme, in the sample purely by chance from a population where the true difference is zero.  An “empirical” significance level or indicator of the weight of evidence against the null hypothesis. 45
  • 46. How to calculate P-value o Use statistical software like SPSS, SAS…….. o Hand calculations —obtained the test statistics (Z Calculated or t- calculated) —find the probability of test statistics from standard normal table —subtract the probability from 0.5 —the result is P-value Note if the test two tailed multiply 2 the result.
  • 47. The P- Value ….. 47  But for what values of p-value should we reject the null hypothesis?  By convention, a p-value of 0.05 or smaller is considered sufficient evidence for rejecting the null hypothesis.  By using p-value of 0.05, we are allowing a 5% chance of wrongly rejecting the null hypothesis when it is in fact true.  When the p-value is less than to 0.05, we often say that the result is statistically significant.
  • 48. Hypothesis testing for single population mean 48  EXAMPLE 1: A researcher claims that the mean of the IQ for university students is 100 with a standard deviation of 10 and the expected value for a sample of 16 students is 110. Test the hypothesis .  Solution 1. Ho:µ=100 VS HA:µ≠100 2. Assume α=0.05 3. Test statistics: z=(110-100)10/4=10/1/10/4 4. z-critical at 0,025 is equal to 1.96. 5. Decision: reject the null hypothesis since 4 ≥ 1.96 6. Conclusion: the mean of the IQ for all population is different
  • 49. Hypothesis testing for single proportions 49  Example: In the study of childhood abuse in psychiatry patients, brown found that 166 in a sample of 947 patients reported histories of physical or sexual abuse. a) constructs 95% confidence interval b) test the hypothesis that the true population proportion is 30%?  Solution (a)  The 95% CI for P is given by ] 2 . 0 ; 151 . 0 [ 0124 . 0 96 . 1 175 . 0 947 825 . 0 175 . 0 96 . 1 175 . 0 ) 1 ( 2            n p p z p 
  • 50. Example…… 50  To the hypothesis we need to follow the steps Step 1: State the hypothesis Ho: P=Po=0.3 Ha: P≠Po ≠0.3 Step 2: Fix the level of significant (α=0.05) Step 3: Compute the calculated and tabulated value of the test statistic 96 . 1 39 . 8 0149 . 0 125 . 0 947 ) 7 . 0 ( 3 . 0 3 . 0 175 . 0 ) 1 (            tab cal z n p p Po p z
  • 51. Example…… 51  Step 4: Comparison of the calculated and tabulated values of the test statistic  Since the tabulated value is smaller than the calculated value of the test the we reject the null hypothesis.  Step 6: Conclusion  Hence we concluded that the proportion of childhood abuse in psychiatry patients is different from 0.3  If the sample size is small (if np<5 and n(1-p)<5) then use student’s t- statistic for the tabulated value of the test statistic.
  • 52. Two sample mean and proportion 52  Still now we have seen estimate for only single mean and single proportion. However it is possible to compute point and interval estimation for the difference of two sample means.  let x1, x2, …, xn1 are samples from the first population and y1, y2, …, yn2 be sample from the second population.  Sample mean for the first population be  Sample mean for the second population  Then the point estimate for the difference of means (µ1-µ2) is given by ) ( Y X  Y X
  • 53. Two sample estimation 53  A (1-α)100% confidence interval for the difference of means is given If are known 2 2 2 1 2 1 2 ) ( n n z y x       2 1,   and
  • 54. Hypothesis testing for two sample means 54  The steps to test the hypothesis for difference of means is the same with the single mean Step 1: state the hypothesis Ho: µ1-µ2 =0 VS HA: µ1-µ2 ≠0, HA: µ1-µ2 <0, HA: µ1-µ2 >0 Step 2: Significance level (α) Step 3: Test statistic 2 2 2 1 2 1 2 1 ) ( ) ( n n y x zcal         
  • 55. Example 55  A researchers wish to know if the data they have collected provide sufficient evidence to indicate a difference in mean serum uric acid levels between normal individual and individual with down’s syndrome. The data consists of serum uric acid readings on 12 individuals with down’s syndrome and 15 normal individuals. The means are 4.5mg/100ml and 3.4 mg/100ml with standard deviation for the population to be 2.9 and 3.5 mg/100ml respectively. 0 : 0 : 2 1 2 1         A O H H
  • 57. Estimation and hypothesis testing for two population proportion 57  Let n1 and n2 be the sample size from the two population. If x and y are the out come of interest then the point estimate for each population is given by p1=x/n1 and p2=y/n2 respectively.  The point estimates π1-π2 =p1-p2  The interval estimate for the difference of proportions is given by  If the sample size is large and n1p1>5, n1 (1-p1)>5, n2p2>5, then              2 2 2 1 1 1 2 2 1 ) 1 ( ) 1 ( n p p n p p z p p 
  • 58. Hypothesis testing for two proportions 58  To test the hypothesis Ho: π1-π2 =0 VS HA: π1-π2 ≠0 The test statistic is given by 2 2 2 1 1 1 2 1 2 1 ) 1 ( ) 1 ( ) ( ) ( n p p n p p p p zcal         
  • 60. Summary  Students sometimes have difficulty deciding whether to use Za/2 or t a/2 values when finding confidence intervals 60
  • 61. t-test One sample t-test:  It is used to compare the estimate of a sample with a hypothesized population mean to see if the sample is significantly different.  Assumptions which should be fulfilled before we use this method:  The dependent variable is normally distributed within the population  The data are independent (scores of one participant are not dependent on scores of the other)
  • 62. T-test cont…  Hypothesis: Ho: μ = μo Vs HA: μ≠ μo , Where μo is the hypothesized mean value The test statistics is : tcalc = (x ̄ – μ)/(s/√n)  We compare the calculated test statistics (tcalc) with the tabulated value (ttab) at n-1 degree of freedom
  • 63. No Distance in miles Drug use No Distance in miles Drug use 1 14.5 no 10 18.4 yes 2 13.4 no 11 16.9 yes 3 14.8 yes 12 12.6 not 4 19.5 yes 13 13.4 not 5 14.5 no 14 16.3 yes 6 18.2 yes 15 17.1 yes 7 16.3 no 16 11.8 not 8 14.8 no 17 13.3 yes 9 20.3 yes 18 14.5 not Mean 15.59 Standard deviation 2.43 T-test cont… E.g. Data: The distance covered by marathon runners until a physiological stress develops and whether they used drug or not
  • 64. T-test cont.. It is believed that the mean distance covered before feeling physiological stress is 15 miles Hypotheses: Ho: = μ = 15 versus HA: μ ≠ 15 Level of significance: α = 5% = 15.59, S = 2.43, tcalc = (x – μ)/(s/√n) (15.59-15)/.57 = 1.03, and P-value = 0.318 At 17 degree of freedom and α = 0.05, ttab = 2.110, Since tcal = 1.03 < 2.110 = ttab, or α = 0.05 < 0.318 =p-value we fail to reject Ho x ̄
  • 65. Two sample t- test  A t-distribution can be used for testing hypotheses about differences of means for independent samples if both populations are normal and have the same variances.  Assesses whether the means of two samples are statistically different from each other. This analysis is appropriate whenever you want to compare the means of two samples/ conditions  Assumptions of a t-test:  from a parametric population  not (seriously) skewed  no outliers
  • 66. right hemisphere Left hemisphere lesion site 12 10 8 6 95% CI infer comp t-tests….  Compare the mean between 2 samples/ conditions  if 2 samples are taken from the same population, then they should have fairly similar means  if 2 means are statistically different, then the samples are likely to be drawn from 2 different populations, ie they really are different Exp. 1 Exp. 2
  • 67. T-test cont.. b. Paired t- test  Each observation in one sample has one and only one mate in the other sample dependent to each other.  For example, the independent variable can be measurements like: before and after (e.g before and after an intervention), or repeated measurement (e.g. using digital and analog apparatus), or when the two data sources are dependent (e.g. data from mother and father of respondent) Hypothesis: Ho: μd = 0 Vs HA: μd ≠ 0
  • 68. T-test cont.. Subject BP before BP after Difference (di) 1 130 110 -20 2 125 130 +5 3 140 120 -20 4 150 130 -20 5 120 110 -10 6 130 130 0 7 120 115 -5 8 135 130 -5 9 140 130 -10 10 130 120 -10 d (Average of d) -9.5 Sd (Standard deviation of d) 8.64 Example : The blood pressure (BP) of 10 mothers were measured before and after taking a new drug.
  • 69. T-test cont.. Hypothesis: Ho: μd = 0 Versus HA: μd ≠ 0 Set the level of significance or α = 0.05 d = -9.5, Sd = 8.64, n = 10,. tcalc = (d – μd)/(sd/√n) = 3.48 and p-value = 0.0075, t-tab< t-cal OR t-cal >t-tab =REJECT Ho At n-1 = 9 df and α = 0.05, ttab = 2.26 Since ttab = 2.26 < 3.48 = tcalc or p-value = 0.0075 < 0.05 = α We reject Ho
  • 70. T-test cont.. c. c. Two independent samples t-test  Used to compare two unrelated or independent groups  Assumptions include:  The variance of the dependent variable in the two populations are equal  The dependent variable is normally distributed within each population  The data are independent (scores of one participant are not related systematically to the scores of the others)  Hypothesis: Ho: μt = μc Vs HA: μt ≠ μc , Where μt and μc are the population mean of treatment and control (placebo) groups respectively.
  • 71. 71 The test statistics for two sample T-test cont….  There are three cases which depend on what is known about the population variances. Case1:  Population variances are known for normal populations (or non normal populations with both and large). In this case the test statistic is to be : 1 n 2 n 2 2 2 1 2 1 2 1 n n X X Z      2 2 1 2 and  
  • 72. 72 Case2:  Populations are unknown but are to be equal in normal populations. In this case, we pool our estimates to get the pooled two- sample variance  For unknown distribution  And the test statistic is to be  Which has a distribution if is true. 2 2 2 2 1      2 2 1 2 2 ) 1 2 ( 2 1 ) 1 1 ( 2       n n S n S n p S ) 2 1 1 1 ( 2 2 1 n n p S X X T    2 1 2 t n n   0 H
  • 73. 73  Case 3:  and are unknown and unequal normal populations . In this case the test statistic is given by: which does have a known distribution. If both n1and n2 are large (both over 30) we can assume a normal distribution 1 2  2 2  2 2 2 1 2 1 2 1 n S n S X X T    
  • 74. Example Do the marathon runners grouped by their drug intake status differ in their average distance coverage before they feel any physiological stress? Hypothesis: Ho: μt = μc Vs HA: μt ≠ μc, where μt and μc are for drug users and non-users respectively Set the level of significance, α = 5%, xc = 13.98, sc = 1.33, xt = 17.20, st= 2.21 tcalc = (xc – xt)/√S2(nc + nt) = -3.741, and its p-value = 0.002 S2 = is the pooled (combined) variance of both groups. At 16 df and α = 0.05, ttab = -2.12 Since tcal= -3.741 < -2.12, or P-Value = 0.002 < 0.05 = α We reject Ho
  • 75. T-test cont…  Here in the case of two independent sample t-test, we have one continuous dependent variable (interval/ratio data) and;  one nominal or ordinal independent variable with only two categories  In this last case (i.e. two independent sample t-test), what if there are more than two categories for the independent variable we have?
  • 76. Inferences for Two or More Means  Are the birth weights of children in different geographical regions the same?  Are the responses of patients to different medications and placebo different?  Are people with different age groups have different proportion of body fat?  Do people from different ethnicity have the same BMI?
  • 77. Assignment  Prevention of Violations of assumption  Detection of Violations of assumptions  goodness-of-fit tests 77