VIP Call Girls Pune Sanjana 9907093804 Short 1500 Night 6000 Best call girls ...
Ā
A+.pptx
1. Confidence Intervals
1
ā¢ We can also put confidence intervals around other sample statistics
too (not just the sample mean)
ā¢ The CI provides the precision for the sample statistic:
100(1-Ī±)% CI: estimate Ā± k x (standard error)
ā¢ So the CI equals our estimate of the sample statistic plus/minus some multiple of
the standard error
ā¢ A 95% CI means that in 95 out of 100 samples of size n, the CI would contain the
population parameter
ā¢ Also common: 90% or 99% CIās
2. Confidence Intervals
2
š
ą“„Ā± š
š
š
The width of the confidence interval depends on:
ā¢ the variation in the population (i.e. the standard deviation š which is
fixed or possibly unknown) ā the more variation the wider the CI
ā¢ The sample size n ā the larger the sample the narrower the CI
ā¢ The significance level ā 99% CI is wider than a 95% CI because we
have to be āmore sureā to capture the true value
100(1-Ī±)% CI: estimate Ā± k x (standard error)
3. Confidence Intervals
3
š„Ņ§ Ā±
š
š
š
š
š
ā¢ Why does the sample size n have to be large?
ā¢ CIās are valid when the CLT holds
ā¢ The sample size n affects the standard error
ā¢ Remember that a fraction gets closer to its numerator
as itās denominator gets larger
ā¢ As š ā ā the SE
š
š
gets closer and closer to š
100(1-Ī±)% CI: estimate Ā± k x (standard error)
4. Calculating Confidence Intervals
4
a) Single Mean
b) Difference between means - independent samples
c) Difference between means - dependent samples
d) Single Proportion
e) Difference between proportions - independent samples
5. Calculating Confidence Intervals
5
a) CI for a Single Mean (known š)
ā¢ If the sample size is large and the population SD is
knownā¦.
ā¢ The CI formula for a single mean is:
š„Ņ§ Ā±
š
š
š
š„Ņ§ is the sample mean
š is the population standard deviation
š is the sample size
š is a cutoff from the standard normal distribution
90% CI: š = 1.64
95% CI: š = 1.96
99% CI: š = 2.58
6. Calculating Confidence Intervals
6
a) CI for a Single Mean (known š) EXAMPLE
ā¢ We want to estimate the age of death for the US population
ā¢ We are told the population SD is 20.2 years
ā¢ In a sample of 100 people we calculate a mean age of death of 72.1
years
ā¢ a 95% confidence interval for the mean age of death is:
š„Ņ§ Ā± š š
š
where:
š„Ņ§ = 72.1
š = 20.2
š = 100
š =1.96
72.1 + 1.96 20.2
ā 76.06
100
72.1 ā 1.96 20.2
ā 68.14
100
95% CI: (68.14, 76.06)
7. C.I. for the difference between
population means (normally
distributed)
7
ā¢ Known variance (2 independent samples)A
100(1āĪ±)% C.I. for Ī¼1 ā Ī¼2 is
š„
ą“„
1 ā š„
ą“„
2 =ZĪ±/2
1
+ 2
š 2 š 2
š1 š2
8. Calculating Confidence Intervals
8
a) CI for a Single Mean (unknown š)
ā¢ Usually the population SD is not known
ā¢ We use the sample standard deviation š instead
ā¢ Use t-distribution instead of standard normal
ā¢ The CI formula for a single mean is:
š„Ņ§ Ā±
š”
š
š
š„Ņ§ is the sample mean
š is the sample standard deviation
š is the sample size
š” is a cut-off from the t-distribution with š ā 1 degrees of freedom
9. Calculating Confidence Intervals
9
Studentās t-distribution
ā¢ Similar to the normal distribution but fatter tails
ā¢ Accounts for extra uncertainly from not knowing š
ā¢ Afamily of curves determined by two parameters:
ā significance level š¼
ā degrees of freedom df
ā¢ The degrees of freedom is š ā 1 because weāve already used one piece of
information related to the variance to estimate š
10. Calculating Confidence Intervals
10
Studentās t-distribution
ā¢ Calculate cut-off values using Stata using the command:
. display invttail(df, p)
where šš = š ā 1 and p is the area in the right tail
ā¢ For a 95% CI use šš = š ā 1 and p=0.025
ā¢ Example: if n=100 the t-value for a 95% CI would be
. display invttail(99, 0.025)
=1.984217
11. Calculating Confidence Intervals
11
a) CI for a Single Mean (unknown š) EXAMPLE
Using formula:
š„Ņ§ Ā±
š”
š
š
The t-value has df=n-1=462-1 and p=0.025
. display invttail(461,0.025)
1.9651232
. display 87.93723 - 1.9651232 * 16.00469 / sqrt(462)=86.473988
. display 87.93723 + 1.9651232 * 16.00469 / sqrt(462)=89.400472
The 95% CI for mean zinc is (86.5, 89.4)
n š„Ņ§ s
13. Calculating Confidence Intervals
13
Effect of significance on width of CI:
95% CI is the default (š¼ = 0.05):
Use level () to change sig. to 99% (š¼ = 0.01):
The CI becomes WIDER
[86.0ā¦[86.4ā¦ š„ ā¦89.4]ā¦89.9]
95% 99%
14. Calculating Confidence Intervals
14
Effect of sample size on width of CI:
If we increase the sample size from 31 to 131
The standard error decreases andā¦
The CI becomes NARROWER n=131 n=31
[145.4ā¦[145.6ā¦ š„ ā¦148.6]ā¦149.8]
15. Calculating Confidence Intervals
15
Effect of SD on width of CI:
If the standard deviation increases from 6 to 10
The standard error increases andā¦ā¦
The CI becomes WIDER s=6 s=10
[143.9ā¦[145.4ā¦ š„ ā¦148.8]ā¦151.3]
16. Calculating Confidence
Intervals
Comparing Z and t:
ā¢ For large n, the t-distribution approximates the normal
ā¢ Suppose the sample mean age of death is š„Ņ§ = 72.1 yrs and that the sample SD is equal t
o
the population SD (s = š = 20.2 yrs)
43
CI formula: Large sample size
(n=100) 95% CI:
Small sample size (n=10)
95% CI:
Normal dist.
š
š„Ņ§ Ā± š
š
Z=1.96
72.1 Ā± 1.96 20.2
ā
100
(68.14, 76.06)
Z=1.96
72.1 Ā± 1.96 20.2
ā
10
(59.58, 84.62)
t-dist.
š
š„Ņ§ Ā± š”
š
t=1.98
72.1 Ā± 1.98 20.2
ā
100
(68.10, 76.10)
t=2.26
72.1 Ā± 2.26 20.2
ā
10
(57.66, 86.54)
Similar CI (large n) t has wider CI than Z (small n)
17. Calculating Confidence
Intervals
ļµ b) CI for a difference in means (independent samples)
ļµ Suppose we have two independent groups of data and calculate a sample mean and sample
for each. The CI formula is:
17
(š„1 ā š„2) Ā± š” š
š 2
1 1
+
š1 š2
Where:
š„1 and š„2 are the sample means
š1 and š2 are the sample sizes
š” is a cut-off from the t-distribution with šš = š1 + š2 ā 2
š
š š is the pooled variance š 2 =
š1ā1 š 2+ š2ā1 š 2
1 2
š1+š2ā2
where š 1 and š 2 are sample
standard deviations
18. Calculating Confidence Intervals
18
b) CI for a difference in means (independent
samples)
Assumptions:
1. The population standard deviations are approximately equal.
We check this by comparing the sample standard deviations.
2. For small sample sizes (say n<100) the population distribution should
approximately follow the normal distribution.
This is checked by assessing the sampling distribution for normality. The
assumption is fairly robust in that the formula is valid as long as the
distribution of data in the sample is approximately mound shaped and
symmetrical.
3. The 2 groups are independent.
4. The subjects within the 2 groups are independent.
19. Calculating Confidence Intervals
19
b) CI for a difference in means (independent
samples)
Example using Stata command:
ttesti š1 š„1 š 1 š2 š„2 š 2
20. Calculating Confidence Intervals
20
b) CI for a difference in means (independent
samples)
ttesti assumes the sample SDs are equal by default
Use unequal option if š 1 and š 2 are not similar
df is
affected
21. Calculating Confidence
Intervals
21
b) CI for a difference in means (independent
samples)
Suppose we want a CI for the difference in mean height between men
and women (assuming independence and normality holdā¦)
ttest varname, by(groupvar) Approx.
equal SDs
22. Calculating Confidence
Intervals
22
c) CI for a difference in means (dependent
samples)
ā¢ Dependent samples occur with two groups of paired or
matched data
ā¢ Usually equal sample sizes in the 2 groups (1:1)
e.g.
ā patient blood pressure before and after a treatment
ā patient left leg and right leg measurements
ā Two groups where pairs of people have been matched on
important demographics (age, sex, etc.)
23. Calculating Confidence Intervals
ļµ c) CI for a difference in means (dependent samples)
ļµ 1. Calculate the pair differences š
ļµ e.g. for each patient, d = BP_after ā BP_before
2. Find the mean šŅ§ and standard deviation š š of the pair
ļµ differences
3. The CI for the mean pair differences is:
ļµ šŅ§ Ā± š”
š š
23
š
Where š is the number of pairs and t has šš = š ā 1
24. Calculating Confidence Intervals
24
c) CI for a difference in means (dependent
samples) EXAMPLE
ā¢ The heartrates of 20 patients
before and after a treatment
ā¢ Want a 95% CI for the difference in
mean heartrate
25. Calculating Confidence Intervals
25
c) CI for a difference in means (dependent
samples) EXAMPLE
First, calculate the differences
Then use the formula šŅ§ Ā± š”
š š š
The 95% CI is: (1.5, 11.2)
27. Calculating Confidence
Intervals
27
šĘø =
d) CI for a single proportion
ā¢ Suppose we have a population of subjects and some of
them have a characteristic of interest and the rest donāt
ā e.g. being female, having a cancer diagnosis, survived
ā¢ We want to estimate the true proportion p who have the
characteristic of interest
ā¢ If r is the number of sample subjects that have the
characteristic and n is the sample size then the sample
proportion is:
š
š
28. Calculating Confidence Intervals
28
ššø šĘø =
d) CI for a single proportion
ā¢ If n is large enough and p is not too extreme, then the
sampling distribution of šĘø is normally distributed (CLT)
ā¢ The standard error of a proportion is:
šĘø(1 ā šĘø)
š
šĘø Ā±
š
ā¢ The CI formula for a single proportion is:
šĘø(1 ā šĘø)
š
where Z is a standard normal cut-off (95% CI: Z=1.96)
29. Calculating Confidence
Intervals
29
d) CI for a single proportion
šĘø Ā±
š
šĘø(1 ā
š
Ęø
) š
ā¢ This formula assumes the rule of thumb:
šš
ą· and š§(š ā š
ą· ) must both be greater than 5
ā¢ (n must be large enough too)
ā¢ Otherwise, the formula is not valid and weād have to use
exact binomial values instead of Z cut-offs
30. Calculating Confidence Intervals
d) CI for a single proportion EXAMPLE
ā¢ Consider the 5-yr survival for lung cancer patients (Pagano p328).
ā¢ We want to estimate the proportion p who survive 5 yrs since dx.
ā¢ In a random sample of n=52 patients only r=6 survive 5 yrs
(šĘø=r/n=6/52~0.12).
Ęø ā 45.76 so the rule of thumb holds
ā¢ Check nšĘø ā 6.24 and n 1 ā š
ā¢ A 95% CI for p is:
šĘø Ā±
š
š
ą· (
1
ā
š
ą· )
š
95% CI: (0.03, 0.21)
So between 3% and 21% of ptx with lung cancer survive 5 yrs after
dx. 57
31. Calculating Confidence Intervals
31
d) CI for a single proportion EXAMPLE
ā¢ In a random sample of n=52 patients only r=6 survive 5 yrs
Using Stata to find the 95% CI:
cii proportions n r
Why is this answer different to the one we calculated with the formula?
Stata uses exact binomial values instead of the normal approximation
to the binomial
32. Calculating Confidence Intervals
Similarly to the CI for a single mean, the width of the CI for
a single proportion is affected by:
32
ā¢ The sample size n
ā increasing the sample size makes the CI narrower/more precise
i.e. small samples have wider CI/less precision
ā The standard error decreases as n increases
ā¢ The significance level
ā A 99% CI is wider than a 95% CI which is wider than a 90% CI
33. Calculating Confidence
Intervals
33
e) CI for a difference in proportions (independent
samples)
ā¢ One group has sample proportion š
ą· 1and sample size
š1
ā¢ Second group has sample proportion š
ą· 2and sample
size
š2
ā¢ The CI formula is:
š
ą· 1ā š
ą· 2
Ā± š
š
ą· 1 1
ā š
ą· 1
š
ą· 2
1 ā š
ą· 2
+
š1 š2
ā¢ The RHS looks complicated but really itās just the standard error for
š
ą· 1 ā š
ą· 2
ššø = š£šš(š
ą· 1) ā š£šš(š
ą· 2)
34. Calculating Confidence Intervals
34
e)
CI for a difference in proportions (independent
samples)
š
ą· 1ā š
ą· 2
Ā± š
š
ą· 1 1 ā š
ą· 1
š
ą· 2 1 ā š
ą· 2
+
š1 š2
ā¢ The formula is only valid for large samples and not too extreme
values of š1 ā š2
ā¢ The rule of thumb is: if šĘø =
š1+š2
š1+š2
ššš
ą· and šš(š ā š
ą· ) must both be greater than 5
and
ššš
ą· and šš(š ā š
ą· ) must both be greater than 5
ā¢ you need to be able to check the rule of thumb, and use the CI
formula (we will calculate the CI using Stata)
35. Calculating Confidence Intervals
35
e) CI for a difference in proportions (independent samples)
EXAMPLE
Stata command:
prtesti š1 š1 š2 š2 , count
š1 = 100 š1= 80
š2 = 100 š2 = 50
36. Calculating Confidence Intervals
36
e) CI for a difference in proportions (independent
samples) EXAMPLE
The difference in proportion of ptx who had pain relief by surgery and by meds
was between 17% and 43% (a higher % of surgery ptx had pain relief
compared to med ptx)
Difference in
sample
proportions
37. One more thingā¦
37
ā¢ Some Stata commands used in the lectures/tutes and
Modules are different in previous versions of Stata:
If you are using Stata 14
(the latest version) the CI
commands are:
cii means n mean sd
ci means varname
cii proportions n r
If you are using an older
version of Stata the CI
commands are:
cii n mean sd
ci varname
cii n r
54. 2. Intersection of Two Events: (AÕB)
ā The eventAÕB consists of all outcomes in bothAand B.
ā The eventAÕB occurs if bothAand B occur.
1/4/2021 54
56. Example: (Classical
Probability)
1/4/2021 56
ā¢ Experiment: Selecting a patient randomly from a hospital
room having six beds numbered 1, 2, 3, 4, 5, and 6.
ā¢ Define the following events:
(1) E1UE2= {1; 2, 3, 4, 6} selecting an even number or a number
less than 4:
58. 3) E1ÕE2 ={2} = selecting an even number and a
number less than 4.
1/4/2021 58
4) E1ÕE4 =ā = selecting an even number and an odd number.
59. ā¢ E1ÕE2 =ā = In this case, E1 and E4 are called disjoint (or
mutually exclusive) events.
ā¢ These kinds of events cannot occur simultaneously (together at
the same time).
1/4/2021 59
61. ā¢ Mutually Exclusive (Disjoint) Events
ā The eventsAand B are disjoint (or mutually exclusive) if
E1ÕE2 =ā
I. P(AÕB)=0
II. P(AUB)=P(A) + P(B)
1/4/2021 61
66. Marginal
Probability
1/4/2021 66
ā¢ Given some variable that can be broken down into (m) categories
designated by A1, A2, . . ., Am and another jointly occurring
variable that is broken down into (s) categories designated by B1,
B2, . . . , Bs
ā¢ The marginal probability of Ai, P(Ai), is equal to the sum of the
joint probabilities ofAi with all categories of B.
ā¢ That is
67. Example: Relative Frequency or
Empirical
1/4/2021 67
ā¢ Let us consider a bivariate table for variablesAand B.
ā¢ There are three categories for both the variables,A1,A2,andA3
forAand B1, B2, and B3 for B
Joint frequency distribution for m categories ofAand s categories of B
68. ā¢ Joint probability distribution for m categories ofAand s
categories of B
1/4/2021 68
69. ā¢ Number of elements in each cell
Probabilities of events
1/4/2021 69
70. Applications of Relative Frequency (Empirical Probability)
ā¢ Let us consider a hypothetical data on four types of diseases of
200 patients from a hospital as shown below:
1/4/2021 70
ā¢ Experiment: Selecting a patient at random and observe his/her
disease type. Total number of trials, sample size, in this case, is
n =200
Disease type A B C D Total
Number of patients 90 80 20 10 200
74. Multiplication Rules of
Probability
1/4/2021 74
Let us consider a hypothetical set of data on 600 adult males
classified by their ages and smoking habits as summarized
Consider the following event:
(B1|A2) = smokes daily given that age is between 30 and 39
75. ā¢ Two-way table displaying number of respondents by age and
smoking habit of respondents smoking habit
1/4/2021 75
78. Binomial
Distribution,ā¦
1/4/2021 78
ā¢ example;
ā if all birth records for a calendar year shows that 85.8% of
the pregnancies had delivery in week 37 or later.
ā The 85.8% interpreted as the probability of a recorded birth
in week 37 or later
ā If we randomly select five birth records from this
population, what is the probability that exactly three of the
records will be for full-term births?
79. Binomial Distribution,ā¦
1/4/2021 79
ā¢ Let us designate the occurrence of a record for a full-term birth
(F) as a āsuccessā and hasten to add that a premature birth (P)
is not a failure
ā¢ It will also be convenient to assign the number 1 to a success
and the number 0 to a failure (record of a premature birth).
80. Binomial Distribution,ā¦
1/4/2021 80
ā¢ Suppose the five birth records selected resulted in this
sequence of full term births
PFPPF
ā¢ In coded form we would write this as
10110
P(1, 0, 1, 1, 0,) =pqppq=q2p3
81. Binomial Distribution,ā¦
1/4/2021 81
ā¢ Three successes and two failures could occur in any one of the
following additional sequences as well:
From the addition rule we know that this
probability is equal to the sum of the
individual probabilities. In the present
example we need to sum the 10q2p3ās or,
equivalently, multiply q2p3 by 10.
82. Binomial Distribution,ā¦
1/4/2021 82
ā¢ Answer for original question is
ā¢ Since in the population, p=0.858;
q=(1-p)=(1- 0.858)=0.142
10q2p3 =10(0.142)2(0.858)3
=10 (0.0202)(0.6316)
= 0.1276
83. Combinations
1/4/2021 83
ā¢ Acombination of n objects taken x at a time is an unordered
subset of x of the n objects
ā¢ Combination is used in large sample procedures
ā¢ The number of combinations of n objects that can be formed
by taking x of them at a time is given by
n x
š¶ =
š!
š„! šāš„ !
ā¢ where x!, read x factorial, is the product of all the whole
numbers from x down to 1. That is,
x! =x(x-1)(x-2)ā¦(1). We note that, by definition, 0!=1.
84. Binomial Distribution,ā¦
1/4/2021 84
ā¢ Let us return to our example in which we have a sample of
n=5 birth records and we are interested in finding the
probability that three of them will be for full-term births
5 3
š¶ =
5! 5š„4š„3š„2š„1
3! 5ā3 ! (3š„2š„1) 2š„1 12
= = 120
= 10
85. Binomial Distribution,ā¦
1/4/2021 85
ā¢ In our example we let x =3, the number of successes, so that n-
x = 2, the number of failures. We then may write the
probability of obtaining exactly x successes in n trials as
f(x)= nšŖx ššāš šš=nšŖx ššššāš , for x=0, 1, 2, ā¦,n
ā¢ The Binomial Parameters
ā binomial distribution has two parameters, n and p
ā Ī¼=np,
ā Ļ2=np(1-p) = npq
86. Poisson Distribution
ā¢ Used to model a discrete random variable representing the number of occurrences or counts of some
random events in an interval of time or space (or some volume of matter)
ā¢ The possible values of X = x are x = 0, 1, 2, 3,ā¦
ā¢ The discrete random variable, X, is said to have a Poisson distribution with parameter (mean) Ī» if the
probability
1/4/2021 86
āĪ» š„
distribution of X is given by f(x) = š Ī»
š„ !
87. Poisson Distribution,ā¦
1/4/2021 87
ā¢ where e = 2.71828 (the natural number).
ā¢ Ī» (lambda) is the parameter of the distribution and is the
average number of occurrences of the random event in the
interval
ā¢ The Poisson Process
ā The occurrences of the events are independent
ā The probability of the single occurrence of the event in a
given interval is proportional to the length of the interval
88. Poisson Distribution,ā¦
1/4/2021 88
ā In any infinitesimally small portion of the interval, the probability of more
than one occurrence of the event is negligible
Example
ā In a study of drug-induced anaphylaxis among patients taking rocuronium
bromide as part of their anesthesia the occurrence of anaphylaxis followed a
Poisson model with Ī»=12 incidents per year in Norway
89. Poisson Distribution,ā¦
1/4/2021 89
ā Find the probability that in the next year, among patients
receiving rocuronium, exactly three will experience
anaphylaxis
3!
ā12 3
f(x=3) = š 12
= 0.00177
ā¢ What is the probability that at least three patients in the next
year will experience anaphylaxis if rocuronium is administered
with anesthesia?
90. Poisson
Distribution
1/4/2021 90
ā¢ Example: Suppose that the number of accidents per day in a city has
a Poisson distribution with average 2 accidents.
1. What is the probability that in a day
I. the number of accidents will be 5,
II. the number of accidents will be less than 2.
2. What is the probability that there will be six accidents in 2
days?
3. What is the probability that there will be no accidents in an
hour?
92. Probability Distributions of
Continuous data
1/4/2021 92
ā¢ A non-negative function f(x) of the continuous random
variable X if the total area bounded by its curve and the x -axis
is equal to 1 and if the subarea under the curve bounded by the
curve, the x -axis, and perpendiculars erected at any two points
a and b give the probability that X is between the points a and
b.
ā¢ Also known as probability density function
93. Probability Distributions of Continuous
dataā¦
1/4/2021 93
Graph of a continuous distribution showing area between a
and b.
94. Normal
distribution
1/4/2021 94
ā¢ Known as the Gaussian distribution
ā¢ The normal density is given by
2š
š
f(x)= 1
šā š„āš 2
/2š2
, āā < š„ < ā;
where (e = 2.71828) and (Ļ = 3.14159).
ā¢ The parameters of the distribution are the Āµ and the Ļ2
X ~N (Ī¼,Ļ2).
95. Normal
Distribution,ā¦
1/4/2021 95
ā¢ The density function of X, f(x), is a bell-shaped curve
ā The highest point of the curve of f(x) is at the mean Ī¼.
Hence, the mode = mean = Ī¼.
ā The curve of f(x) is symmetric about the mean Ī¼.
ā In other words, mean = mode = median
ā The area under the curve is 1
96. Standard Normal
Distribution
1/4/2021 96
ā¢ the standard normal distribution with mean Āµ = 0 and variance
Ļ2 =1
ā¢ Denoted by Normal (0,1) or N(0,1).
ā¢ The standard normal random variable is denoted by Z, and we
write Z~N(0,1)
Z=š„āš
š
ā¢ The equation for the standard normal distribution is written
Z=
1
2š
2
šāš§ /2, āā < š§ < ā
98. Standard Normal Distribution,ā¦
1/4/2021 98
ā¢ Z-transformation that yields a value of Z, Z=1 indicates that
the value of x used in the transformation is 1 standard
deviation above 0.
ā¢ A value of Z = -1 indicates that the value of x used in the
transformation is 1 standard deviation below 0.
99. Standard Normal Distribution,ā¦
1/4/2021 99
ā¢ Example;
ā What is the probability that a z picked at random from the
population of zās will have a value between -2.55 and 2.55?
answer:
P(-2.55<z<2.55)=0.9946-
0.0054
=0.9892
101. Application of normal distribution
1/4/2021 101
ā¢ Normal distribution is not a law that is adhered to by all
measurable characteristics occurring in nature
ā¢ However, many of these characteristics are approximately
normally distributed
ā¢ Used to model the distribution of many variables that are of interest
ā¢ Allows us to make useful probability statements about some variables
conveniently than would be the case if some more complicated model had to be
used
102. Application,ā¦
1/4/2021 102
ā¢ Example:
ā Let us consider weight of women in reproductive age
follows a normal distribution with mean 49 kg and variance
25 kg2
a. Find the probability that a randomly chosen woman in
her reproductive age has weight less than 45 kg.
b. What is the percentage of women having weight less
than 45 kg?
c. In a population of 20,000 women of reproductive age,
how many would you expect to have weight less than
45 kg?
103. ā¢ Solution
ā Here the random variable, X = weight of women in
reproductive age, population mean = 49 kg, population
variance= Ļ2 = 25 kg2, population standard deviation = Ļ =
5 kg. Hence, X~Normal (49,25).
a. The probability that a randomly chosen woman in
reproductive age has weight less than 45 kg is P(X<45)
1/4/2021 103
104. Application,ā¦
1/4/2021 104
ā The percentage of women of reproductive age who have
weight less than 45 kg is P(x<45) x100% = 0.2119 x100%
= 21.19%
ā¢ In a population of 20,000 women of reproductive age, we
would expect that the number of women with weight less than
45 kg is P(X <45)x 20,000 =0.2119 x20,000 = 4238.
105. Advantages of sampling:
105
ā¢ Feasibility: Sampling may be the only feasible method of
collecting the information.
ā¢ Reduced cost: Sampling reduces demands on resource such
as finance, personnel, and material.
ā¢ Greater accuracy: Sampling may lead to better accuracy of
collecting data
ā¢ Sampling error: Precise allowance can be made for sampling
error
ā¢ Greater speed: Data can be collected and summarized more
quickly
106. Disadvantages of sampling:
106
ā¢ There is always a sampling error.
ā¢ Sampling may create a feeling of discrimination within the
population.
ā¢ Sampling may be inadvisable where every unit in the
population is legally required to have a record.
107. Sampling technique
107
ā¢ There are two different approaches to sampling in survey research:
ā Nonprobability sampling
ā Probability sampling
108. ļµ Probability sampling methods
ā¢ A sample obtained in a way that every number of the
population has a known &non-zero.
ā Probability of being include in the sample i.e. involves
random selection of sample
ā Involves the selection of a sample from a population,
based on chance
ā¢ Probability sampling is
ā More complex,
ā More time-consuming
ā Usually more costly than non-probability sampling.
109. EXAMPLE OF SIMPLE RANDOM SAMPLING
ļµ Age at first sex and associated factors for early sexual initiation
among students at University of AU, Central Ethiopia
ā There are a total of 8, 000 students
ā We want to select 700 sample students
ā In this case, we assumed homogeneity with respect to age at first sex
ā Their ID can be taken as frame
ā Hence we can use computer generated random number to select 700
students randomly
110. ļµ Steps in systematic random sampling
1.Number the units on your frame from 1 to N (where N is the
total population size).& n=sample size
2. Determine the sampling interval (K) by dividing the number of
units in the population by the desired sample size. K=N/n k=sampling
interval=population size n=sample size
3. Draw a random number between one and K. This number is called the
random start and would be the first number included in your sample.
ā Let the selected number be j
4. Select every Kth unit after that first number j, j+k, j+2k, j+3k----
-----------------j+nk
111. EXAMPLE
ļµ A systematic sample is to be selected from 1200 students of a school.
The sample size selected is 100. The sampling fraction is (skip interval)
k=1200/100=12
ā¢ The number of the first student to be included in the sample is chosen
randomly, for example by blindly 30 picking one out of twelve pieces of
paper, numbered 1- 12.
ā¢ If number 6 is picked, then every twelfth student will be included in the
sample, starting with student number 6, until 100 students are selected:
then numbers selected would be 6, 18, 30, 42, etc.
112. Stratified sampling ā¦
The procedures are:
ā Divided the total population into different homogeneous
subgroups (strata)
ā Allocate sample for each strata (ni)
ā¢ Proportional allocation (ni =Ni(n/N))Ā» Where
ļµ ni =sample for each strata
Ni=total population of each strata
n=required sample size
N=total population of the
ā¢ Disproportional (equal allocation) is some times also
possible
113. ļµ Example
ā¢ A survey is conducted on household water supply in a district comprising
20,000 households, of which 20% are urban and 80% rural
ā¢ It is suspected that in urban areas the access to safe water sources is much
more satisfactory. The total population of the district is 10, 000 (urban=4000
and rural=6000). The sample size required has been decided to be 300
ā¢ Allocate the sample proportionally for both strata?
n
urban= 4000*300/10,000=120
n
rural= 6000*300/10,000=180
114. ļµ Steps in cluster sampling
ā¢ The reference population (homogeneous) is divided into clusters.
ā These clusters are often geographic units (e.g. districts, villages, etc.).
ā¢ A number of clusters are selected randomly to represent the total population,
and then all units within selected clusters are included in the sample.
ā¢ No units from non-selected clusters are included in the sampleāthey are
represented by those selected clusters
ā This differs from stratified sampling, where some units are selected from
each group
ā All the units in the selected clusters are studied
115. ļµ Example
ā¢ In a study of knowledge, attitudes, and practices related to family planning in
rural communities of a region, a list is made of all the villages.
ā¢ Using this list, a random sample of villages is chosen and all the adults in the
selected villages are interviewed
116. Multi-stage sampling
ļµ In a study of utilization of pit latrines in a district, 150 homesteads are to be
visited for interviews with family members as well as for observations on
types and cleanliness of latrines.
ā¢ The district is composed of six wards and each ward has between six and
nine villages.
ā¢ The following four stage sampling procedure could be performed:
ā Select three wards out of the six by simple random sampling
ā For each ward, select five villages by simple random sampling (15 villages
in total)
117. ļµ For each village select ten households. Because simply choosing
households in the center of the village would produce a biased
sample, the following systematic sampling procedure is
proposed:
ā Go to the center of the village
ā Choose a direction in random way
ā Walk in the chosen direction and select every third or every fifth
household (depending on the size of the village) until you have the ten you
need.
118. PROBLEM
ļµA population of cancer patients has survival standard
deviation of 43.4 months. If one wants to conduct a
study on these populations how large sample size is
needed, so that 95% of the sample mean of this size will
be within Ā±6 months of the population mean. Population
size is 480 patients. (85)
119. ļµ In a survey of school children to determine the population of
immunized children against polio, an investigator determined the
maximum discrepancy b/n sample and population proportion of
immunized to be 0.04, at level of confidence of 99%.further the
investigator had a previous knowledge on the prevalence among
children in a similar community to be 90% and the total
population of school children is 800.
120. ļµ The mean weight of 100 children who are 5 years old in a certain
locality is found to be 14 kg. A clinician wants to know the mean
weight of all the children in that locality with 95 % confidence
interval, if it is known that the SD for all children is 4kg
121. ļµ suppose a survey conducted on a reprehensive
sample of 900 newborn babies in A/A and it is
found that their average weight at birth is 3.5 kg
with SD of 0.5Kg. estimate the wt of newborn
babies in A/A at 95% level of confidence.
122. ļµ sample of 20 houses studied to estimate the
mean sprayable area of house for controlling of
malaria
epidemic. The result was =22.9m2, SD is
6.0m.construct CI for mean sprayable of area of
the
population with 95% confidence.
123. ļµ A random sample of 100 people shows that 25
are left-handed. Form a 95% CI for the true
proportion of
left-handers.
124.
125.
126. ļµ In a clinical trial for a new drug to treat hypertension, N1 = 50
patients were randomly assigned to receive the new drug, and N2 =
50 patients to receive a placebo. 34 of the patients receiving the drug
showed improvement, while 15 of those receiving placebo showed
improvement.
ā Compute a 95% CI estimate for the difference between proportions
improved.
127. ļµ A simple random sample of 10 people from a certain
population has a mean age of 27. Can we conclude that the
mean age of the population is not 30? The variance is known
to be 20. Let CL = .95.
Data
n = 10, sample mean = 27, ļ³2 = 20, Ī± = 0.05
B. Assumptions
Simple random sample
Normally distributed population
128. ļµ A simple random sample of 14 people from a certain
population gives a sample mean body mass index (BMI)
of 30.5 and sd of 10.64. Can we conclude that the BMI
is not 35 at Ī± 5%?
129. ļµ The means SUA levels on 12 individuals with Downās
syndrome and 15 normal individuals are 4.5 and 3.4
mg/100 ml, respectively. With variances. ( 2=1,
2=1.5, respectively). Is there a difference between the
means of both groups at Ī± 5%?
130. ļµ We wish to know if we may conclude, at the 95%
confidence level, that smokers, in general, have
greater lung damage than do non-smokers.
131. ļµ In the general population of 0 to 4-year-olds, the annual
incidence of asthma is 1.4%. If 10 cases of asthma are
observed over a single year in a sample of 500 children
whose mothers smoke, can we
conclude that this is different from the underlying
probability of p0 = 0.014 (or p=1.4%)? cl = 95%
132. ļµ Among the 225 students who ate the sandwiches, 109 became ill.
While, among the 38 students who did not eat the sandwiches, 4
became ill. Is there a significant difference between the two
groups at Ī±
=5%