4. Figure 2. Empirical vs Theoretical distribution
Introduction
28-08-2017 4Dr Tanveer Rehman PSM JIPMER
5. Figure 3. Examples of different shapes for distributions
Frequency Distributions
1. Symmetrical
2. Skewed
28-08-2017 5Dr Tanveer Rehman PSM JIPMER
6. Figure 4. Measures of central tendency
for three symmetrical distributions: normal, bimodal, and rectangular.
Central Tendency
28-08-2017 6Dr Tanveer Rehman PSM JIPMER
7. When To Use The Median Or Mode
Figure 5. Measures of central tendency for skewed distributions
28-08-2017 7Dr Tanveer Rehman PSM JIPMER
8. Figure 6. the scores in the sample are less
variable (spread out) than the scores in the
population
Measures of Variability
1. Range
2. Variance
3. Standard deviation
28-08-2017 9Dr Tanveer Rehman PSM JIPMER
9. Figure 7.Following a z-score transformation, the X-
axis is relabelled in z-score units.
Z or Standard Score
1. Purpose
2. Equation
28-08-2017 10Dr Tanveer Rehman PSM JIPMER
10. Figure 8. The distribution of original score, a
z-score, and a new, standardized score.
Z-score Distributions
1. Mean
2. S.D.
3. Shape
4. Standardized distribution
5. Disadvantage
28-08-2017 11Dr Tanveer Rehman PSM JIPMER
13. Figure 9. The normal distribution following a
z-score transformation.
Normal Distribution
1. Discoverer
2. Shape
3. Equation
4. Proportions
28-08-2017 14Dr Tanveer Rehman PSM JIPMER
14. Figure 10. A portion of the unit normal table.
28-08-2017 15Dr Tanveer Rehman PSM JIPMER
15. Application Of Normal Distribution
Q1. The population distribution of health expenditures is normal
with a mean of Rs. 5000 and a standard deviation of Rs. 1000.
What is the probability of randomly selecting an individual from
this population whose health expenditures is greater than Rs.
7000?
28-08-2017 16Dr Tanveer Rehman PSM JIPMER
17. First, the probability question is translated into a proportion
question: Out of all possible scores, what proportion is greater
than 700?
The mean is μ 500, so the score X 700 is to the right of the
mean. Because we are interested in all scores greater than 700,
we shade in the area to the right of 700. This area represents the
proportion we are trying to determine. Identify the exact position
of X > 700 by computing a z-score. For this example, That is, a
score of X 700 is exactly 2 standard deviations above the mean
and corresponds to a z-score of z2.00.. The proportion we are
trying to determine may now be expressed in terms of its z-
score:p(z > 2.00) ? all normal distributions regardless of the
values for μ and , have 2.28% of the scores in the tail beyond
z>2.00. Thus, for the population of scores, p(X >700) p(z=2.00)
2.28%
28-08-2017 18Dr Tanveer Rehman PSM JIPMER
18. Figure 11 depicts the typical distribution of
sample means
Central Limit Theorem
1. Marquis de Laplace
2. The distribution of sample means
3. Any of two conditions to satisfy
28-08-2017 19Dr Tanveer Rehman PSM JIPMER
19. Figure 12 shows that the size of the
standard error decreases as the
sample size increases.
Standard Error of M
1. Serves the same two purposes
2. The overall mean is equal to μ
3. SD is the “starting point”
28-08-2017 21Dr Tanveer Rehman PSM JIPMER
20. Application Of Normal Distribution
Q2. A positively skewed distribution has μ = 60 and σ = 8.
a. What is the probability of obtaining a sample mean greater than M = 62
for a sample of n = 4 scores?
b. What is the probability of obtaining a sample mean greater than M = 62
for a sample of n = 64 scores?
28-08-2017 22Dr Tanveer Rehman PSM JIPMER
21. Key to Q2.
a. The distribution of sample means does not satisfy either of the
criteria for being normal.
Therefore, you cannot use the unit normal table, and it is impossible to
find the probability.
b. With n = 64, the distribution of sample means is nearly normal. The
standard error is
8/ square root 64 = 1, the z-score is = 2.00, and the probability is
0.0228
28-08-2017 23Dr Tanveer Rehman PSM JIPMER
22. Figure 13. The critical region (very unlikely
outcomes) for α = 0.05.
Hypothesis testing
1. State the hypothesis
2. Set the criteria for a decision
3. Collect data and compute
sample statistics
4. Make a decision
28-08-2017 24Dr Tanveer Rehman PSM JIPMER
23. Application Of Normal Distribution
Q3. A health care project begins with a known population of all PHC staff—in this
case, scores on a standardized test that are normally distributed with µ = 65 and
σ = 15.
The Project officer suspects that special training in reading skills will produce a
change in the scores for the staff. Because it is not feasible to administer the
treatment (the special training) to everyone in the population, a sample of n = 25
individuals is selected, and the treatment is given to this sample.
Following treatment, the average score for this sample is M = 70. Is there
evidence that the training has an effect on test scores?
28-08-2017 25Dr Tanveer Rehman PSM JIPMER
24. Key to Q3
State the hypothesis and select an alpha level. The null hypothesis states that
the special training has no effect. In symbols, H0: µ=65 (After special training,
the mean is still 65.) The alternative hypothesis states that the treatment
does have an effect. H1: µ is not equal to 65 (After training, the mean is
different from 65.) At this time you also select the alpha level. For this
demonstration, we will use α=0.05. Thus, there is a 5% risk of committing a
Type I error if we reject H0. Locate the critical region. With alpha .05, the
critical region consists of sample means that correspond to z-scores beyond
the critical boundaries of z=1.96. Obtain the sample data, and compute the
test statistic. For this example, the distribution of sample means, according
to the null hypothesis, is normal with an expected value of mean 65 and a
standard error of 3. In this distribution, our sample mean of M=70
corresponds to a z-score of 1.67 Make a decision about H0, and state the
conclusion. The z-score we obtained is not in the critical region. This
indicates that our sample mean of M=70 is not an extreme or unusual value
to be obtained from a population with mean = 65. Therefore, our statistical
decision is to fail to reject H0. Our conclusion for the study is that the data
do not provide sufficient evidence that the special training changes test
scores.
28-08-2017 26Dr Tanveer Rehman PSM JIPMER
25. Figure 14 shows t distributions have more variability,
indicated by the flatter and more spread-out shape.
t Distribution
1. W. S. Gossett
2. Use
3. t statistic formula
4. Every sample like z-score
5. Approximation depends on df
6. Shape
28-08-2017 27Dr Tanveer Rehman PSM JIPMER
26. Application Of t Distribution
Q4. A research project by Central Government would like to determine
whether there is a relationship between depression and aging.
It is known that the general population averages μ = 40 on a standardized
depression test.
The project officer obtains a sample of n = 9 individuals who are all more
than 70 years old.
The depression scores for this sample are as follows: 37, 50, 43, 41, 39, 45,
49, 44, 48.
On the basis of this sample, is depression for elderly people significantly
different from depression in the general population?
28-08-2017 28Dr Tanveer Rehman PSM JIPMER
28. H0: μ = 40. With df = 8 the critical values are t = +-2.306. For these
data, M = 44, SS = 162, s2 = 20.25, the standard error is 1.50, and t =
2.67.
Reject H0 and conclude that depression for the elderly is significantly
different from depression for the general population.
28-08-2017 30Dr Tanveer Rehman PSM JIPMER
29. Figure 15 The binomial distribution is always a discrete
histogram, and the normal distribution is a continuous,
smooth curve
Binomial distributions
1. Two categories
2. Criteria for normality
3. Mean and SD
4. Z score
28-08-2017 31Dr Tanveer Rehman PSM JIPMER
30. Application Of Binomial Distribution
Q5. A national health organization predicts that 50% of residents of a
particular area will get the flu this season. If a sample of 40 residents is
selected from the population, what is the probability that at least 26 of the
people will be diagnosed with the flu?
28-08-2017 32Dr Tanveer Rehman PSM JIPMER
32. Probability of flu = 0.50=p
q= 1- p= 0.5; n= 40
pn= 20 & qn=20, both above 10; so it’s a binomial distribution
Mean = pn = 20; SD= square root of npq= square root of 10 = 3.16
We are looking for 26 or more to get flu; so in a continuous data – real
lower limit is 25.5
Z score is 25.5-20 divided by 3.16= 1.74
In unit normal table, it comes to 4.09%
28-08-2017 34Dr Tanveer Rehman PSM JIPMER
33. Figure 16 Chi-square distributions are
positively skewed. The critical region is placed
in the extreme tail, which reflects large chi-
square values.
Chi-square Distribution
1. Formula
2. Chi-square value
3. Characteristics
28-08-2017 35Dr Tanveer Rehman PSM JIPMER
34. Application of Chi-square Distribution
Q6. Consider the following results from an influenza vaccine trial carried out
during an epidemic. Of 460 adults who took part, 240 received influenza
vaccination and 220 placebo vaccination. Overall 100 people contracted
influenza, of whom 20 were in the vaccine group and 80 in the placebo
group. We now wish to assess the strength of the evidence that vaccination
affected the probability of contracting influenza.
28-08-2017 36Dr Tanveer Rehman PSM JIPMER
37. Poisson Distribution
1. The number of discrete occurrences of an event during a period of time
2. Independent and random
3. The occurrences in each interval can range from zero to infinity
4. Depends upon just one parameter: the mean number of occurrences in
periods of the same length
28-08-2017 39
P(x) =
e -µµx
x!
Dr Tanveer Rehman PSM JIPMER
38. Figure 17. The distribution is very skewed for small means, when there is a
sizeable probability that zero events will be observed. It is symmetrical for large
means and is adequately approximated by the normal distribution for values of
λ=10 or more.
28-08-2017 40Dr Tanveer Rehman PSM JIPMER
39. Application of Poisson distribution
Q7. A district health authority which plans to close the smaller of two
maternity units is assessing the extra demand this will place on the
remaining unit.
At present the larger unit averages 4.2 admissions per day and can cope
with a maximum of 10 admissions per day. This results in the unit’s capacity
being exceeded only on about one day per year.
After the closure of the smaller unit the average number of admissions is
expected to increase to 6.1 per day.
Estimate the proportion of days on which the unit’s capacity is then likely to
be exceeded.
28-08-2017 41Dr Tanveer Rehman PSM JIPMER
41. Figure 18 shows of all the values in the
distribution, only 5% are larger than F= 3.88,
and only 1% are larger than F =6.93.
F - Distribution
1. ANOVA
2. Advantage over t tests
3. F-ratio
4. Characteristics
5. Exact shape depends on df
28-08-2017 43Dr Tanveer Rehman PSM JIPMER
42. Table 1. A portion of the F distribution table.
28-08-2017 44Dr Tanveer Rehman PSM JIPMER
43. Application of F-distribution
Q8. A researcher obtains
F = 4.18 with df = 2,15.
Is this value sufficient to
reject H0 with α = 0.05?
Is it big enough to reject H0
if α = 0.01?
28-08-2017 45Dr Tanveer Rehman PSM JIPMER
44. Thank You
There is an old saying that goes, “If it looks like a duck, walks like
a duck and quacks like a duck, then it is a duck.” If it
looks/walks/quacks like a duck, the statistician will use the
inferential reasoning appropriate for ducks, despite having no
real assurance that this bird actually has duck DNA.
28-08-2017 46Dr Tanveer Rehman PSM JIPMER