Turning from discrete to continuous distributions, in this section we discuss the normal distribution. This is the most important continuous distribution because in applications many random variables are normal random variables (that is, they have a normal distribution) or they are approximately normal or can be transformed into normal random variables in a relatively simple fashion. Furthermore, the normal distribution is a useful approximation of more complicated distributions, and it also occurs in the proofs of various statistical tests.
Normal Distribution, also called Gaussian Distribution, is one of the widely used continuous distributions existing which is used to model a number of scenarios such as marks of students, heights of people, salaries of working people etc.
Each binomial distribution is defined by n, the number of trials and p, the probability of success in any one trial.
Each Poisson distribution is defined by its mean.
In the same way, each Normal distribution is identified by two defining characteristics or parameters: its mean and standard deviation.
The Normal distribution has three distinguishing features:
• It is unimodal, in other words there is a single peak.
• It is symmetrical, one side is the mirror image of the other.
• It is asymptotic, that is, it tails off very gradually on each side but the line representing the distribution never quite meets the horizontal axis
Turning from discrete to continuous distributions, in this section we discuss the normal distribution. This is the most important continuous distribution because in applications many random variables are normal random variables (that is, they have a normal distribution) or they are approximately normal or can be transformed into normal random variables in a relatively simple fashion. Furthermore, the normal distribution is a useful approximation of more complicated distributions, and it also occurs in the proofs of various statistical tests.
Normal Distribution, also called Gaussian Distribution, is one of the widely used continuous distributions existing which is used to model a number of scenarios such as marks of students, heights of people, salaries of working people etc.
Each binomial distribution is defined by n, the number of trials and p, the probability of success in any one trial.
Each Poisson distribution is defined by its mean.
In the same way, each Normal distribution is identified by two defining characteristics or parameters: its mean and standard deviation.
The Normal distribution has three distinguishing features:
• It is unimodal, in other words there is a single peak.
• It is symmetrical, one side is the mirror image of the other.
• It is asymptotic, that is, it tails off very gradually on each side but the line representing the distribution never quite meets the horizontal axis
Please Subscribe to this Channel for more solutions and lectures
http://www.youtube.com/onlineteaching
Chapter 6: Normal Probability Distribution
6.1: The Standard Normal Distribution
The Normal Distribution:
There are different distributions namely Normal, Skewed, and Binomial etc.
Objectives:
Normal distribution its properties its use in biostatistics
Transformation to standard normal distribution
Calculation of probabilities from standard normal distribution using Z table.
Normal distribution:
- Certain data, when graphed as a histogram (data on the horizontal axis, frequency on the vertical axis), creates a bell-shaped curve known as a normal curve, or normal distribution.
- Two parameters define the normal distribution, the mean (µ) and the standard deviation (σ).
Properties of the Normal Distribution:
Normal distributions are symmetrical with a single central peak at the mean (average) of the data.
The shape of the curve is described as bell-shaped with the graph falling off evenly on either side of the mean.
Fifty percent of the distribution lies to the left of the mean and fifty percent lies to the right of the mean.
-The mean, the median, and the mode fall in the same place. In a normal distribution the mean = the median = the mode.
- The spread of a normal distribution is controlled by the
standard deviation.
In all normal distributions the range ±3σ includes nearly
all cases (99%).
Uni modal:
One mode
Symmetrical:
Left and right halves are mirror images
Bell-shaped:
With maximum height at the mean, median, mode
Continuous:
There is a value of Y for every value of X
Asymptotic:
The farther the curve goes from the mean, the closer it gets to the X axis but it never touches it (or goes to 0).
The total area under a normal distribution curve is equal to 1.00, or 100%.
Using Normal distribution for finding probability:
While finding out the probability of any particular observation we find out the area under the curve which is covered by that particular observation. Which is always 0-1.
Transforming normal distribution to standard normal distribution:
Given the mean and standard deviation of a normal distribution the probability of occurrence can be worked out for any value.
But these would differ from one distribution to another because of differences in the numerical value of the means and standard deviations.
To get out of this problem it is necessary to find a common unit of measurement into which any score could be converted so that one table will do for all normal distributions.
This common unit is the standard normal distribution or Z
score and the table used for this is called Z table.
- A z score always reflects the number of standard deviations above or below the mean a particular score or value is.
where
X is a score from the original normal distribution,
μ is the mean of the original normal distribution, and
σ is the standard deviation of original normal distribution.
Steps for calculating probability using the Z-
score:
-Sketch a bell-shaped curve,
- Shade the area (which represents the probability)
-Use the Z-score formula to calculate Z-value(s)
-Look up Z-values in table
3. Properties of a Normal Distribution
• The mean, median, and mode are equal
• Bell shaped and is symmetric about the mean
• The total area that lies under the curve is one or 100%
x
4. • As the curve extends farther and farther away from the
mean, it gets closer and closer to the x-axis but never
touches it.
• The points at which the curvature changes are called
inflection points. The graph curves downward between the
inflection points and curves upward past the inflection
points to the left and to the right.
x
Inflection pointInflection point
Properties of a Normal Distribution
5. Means and Standard Deviations
2012 15 1810 11 13 14 16 17 19 21 229
12 15 1810 11 13 14 16 17 19 20
Curves with different means, different standard deviations
Curves with different means, same standard deviation
6. Empirical Rule
About 95% of the area
lies within 2 standard
deviations
About 99.7% of the area lies within
3 standard deviations of the mean
About 68% of the area
lies within 1 standard
deviation of the mean
68%
7. 4.2 4.5 4.8 5.13.93.63.3
Determining Intervals
An instruction manual claims that the assembly time for a
product is normally distributed with a mean of 4.2 hours
and standard deviation 0.3 hour. Determine the
interval in which 95% of the assembly times fall.
x
4.2 – 2 (0.3) = 3.6 and 4.2 + 2 (0.3) = 4.8.
95% of the assembly times will be between 3.6 and 4.8 hrs.
95% of the data will fall within 2 standard deviations of the mean.
9. The Standard Score
The standard score, or z-score, represents the number of
standard deviations a random variable x falls from the
mean.
The test scores for a civil service exam are normally
distributed with a mean of 152 and a standard deviation of
7. Find the standard z-score for a person with a score of:
(a) 161 (b) 148 (c) 152
(a) (b) (c)
10. The Standard Normal Distribution
The standard normal distribution has a mean of 0 and a
standard deviation of 1.
Using z-scores any normal distribution can be
transformed into the standard normal distribution.
–4 –3 –2 –1 0 1 2 3 4 z
11. Cumulative Areas
• The cumulative area is close to 1 for z-scores close to
3.49.
0 1 2 3–1–2–3 z
The
total
area
under
the curve
is one.
• The cumulative area is close to 0 for z-scores close
to –3.49.
• The cumulative area for z = 0 is 0.5000.
12. Find the cumulative area for a z-score of –1.25.
0 1 2 3–1–2–3 z
Cumulative Areas
0.1056
Read down the z column on the left to z = –1.25 and across to
the column under .05. The value in the cell is 0.1056, the
cumulative area.
The probability that z is at most –1.25 is 0.1056.
13. Finding Probabilities
To find the probability that z is less than a given value,
read the cumulative area in the table corresponding to
that z-score.
0 1 2 3–1–2–3 z
Read down the z-column to –1.4 and across to .05. The
cumulative area is 0.0735.
Find P(z < –1.45).
P (z < –1.45) = 0.0735
14. Finding Probabilities
To find the probability that z is greater than a given
value, subtract the cumulative area in the table
from 1.
0 1 2 3–1–2–3 z
P(z > –1.24) = 0.8925
Find P(z > –1.24).
The cumulative area (area to the left) is 0.1075. So the area
to the right is 1 – 0.1075 = 0.8925.
0.1075
0.8925
15. Finding Probabilities
To find the probability z is between two given values, find the
cumulative areas for each and subtract the smaller area from
the larger.
Find P(–1.25 < z < 1.17).
1. P(z < 1.17) = 0.8790 2. P(z < –1.25) = 0.1056
3. P(–1.25 < z < 1.17) = 0.8790 – 0.1056 = 0.7734
0 1 2 3–1–2–3 z
16. 0 1 2 3-1-2-3 z
Summary
0 1 2 3-1-2-3 z
To find the probability is greater
than a given value, subtract the
cumulative area in the table from 1.
0 1 2 3-1-2-3 z
To find the probability z is
between two given values, find the
cumulative areas for each and
subtract the smaller area from the
larger.
To find the probability that z is less
than a given value, read the
corresponding cumulative area.
18. Probabilities and Normal Distributions
115100
If a random variable, x is normally distributed, the
probability that x will fall within an interval is equal to the
area under the curve in the interval.
IQ scores are normally distributed with a mean of 100
and a standard deviation of 15. Find the probability that a
person selected at random will have an IQ score less
than 115.
To find the area in this interval, first find the standard
score equivalent to x = 115.
19. 0 1
Probabilities and Normal Distributions
Find P(z < 1).
115100
Standard Normal
Distribution
Find P(x < 115).
Normal Distribution
P(z < 1) = 0.8413, so P(x <115) = 0.8413
SAME
SAME
20. Monthly utility bills in a certain city are normally
distributed with a mean of $100 and a standard deviation
of $12. A utility bill is randomly selected. Find the
probability it is between $80 and $115.
P(80 < x < 115)
Normal Distribution
P(–1.67 < z < 1.25)
0.8944 – 0.0475 = 0.8469
The probability a utility bill is
between $80 and $115 is 0.8469.
Application
22. z
From Areas to z-Scores
Locate 0.9803 in the area portion of the table. Read the
values at the beginning of the corresponding row and at
the top of the column. The z-score is 2.06.
Find the z-score corresponding to a cumulative area of 0.9803.
z = 2.06 corresponds
roughly to the
98th percentile.
–4 –3 –2 –1 0 1 2 3 4
0.9803
23. Finding z-Scores from Areas
Find the z-score corresponding to the 90th percentile.
z0
.90
The closest table area is .8997. The row heading is
1.2 and column heading is .08. This corresponds to
z = 1.28.
A z-score of 1.28 corresponds to the 90th percentile.
24. Find the z-score with an area of .60 falling to its right.
.60
.40
0 z
z
With .60 to the right, cumulative area is .
40. The closest area is .4013. The row
heading is 0.2 and column heading is .05.
The z-score is 0.25.
A z-score of 0.25 has an area of .60 to its right.
It also corresponds to the 40th percentile
Finding z-Scores from Areas
25. Find the z-score such that 45% of the area under the
curve falls between –z and z.
0 z–z
The area remaining in the tails is .55. Half this area is
in each tail, so since .55/2 = .275 is the cumulative area
for the negative z value and .275 + .45 = .725 is the
cumulative area for the positive z. The closest table
area is .2743 and the z-score is 0.60. The positive z
score is 0.60.
.45
.275.275
Finding z-Scores from Areas
26. From z-Scores to Raw Scores
The test scores for a civil service exam are normally
distributed with a mean of 152 and a standard deviation of 7.
Find the test score for a person with a standard score of:
(a) 2.33 (b) –1.75 (c) 0
(a) x = 152 + (2.33)(7) = 168.31
(b) x = 152 + (–1.75)(7) = 139.75
(c) x = 152 + (0)(7) = 152
To find the data value, x when given a standard score, z:
27. Finding Percentiles or Cut-off Values
Monthly utility bills in a certain city are normally distributed
with a mean of $100 and a standard deviation of $12. What is
the smallest utility bill that can be in the top 10% of the bills?
10%
90%
Find the cumulative area in the table that is closest to
0.9000 (the 90th percentile.) The area 0.8997 corresponds
to a z-score of 1.28.
x = 100 + 1.28(12) = 115.36.
$115.36 is the smallest
value for the top 10%.
z
To find the corresponding x-value, use
29. Sample
Sampling Distributions
A sampling distribution is the probability distribution of a
sample statistic that is formed when samples of size n are
repeatedly taken from a population. If the sample statistic is
the sample mean, then the distribution is the sampling
distribution of sample means.
Sample
The sampling distribution consists of the values of the sample
means,
Sample
Sample
Sample
Sample
30. x
the sample means will have a normal distribution
The Central Limit Theorem
and standard deviation
If a sample n 30 is taken from a population with
any type distribution that has a mean =
and standard deviation =
31. the distribution of means of sample size n, will be normal
with a mean
standard deviation
The Central Limit Theorem
x
If a sample of any size is taken from a population with a
normal distribution with mean = and standard
deviation =
32. Application
Distribution of means of sample size 60,
will be normal.
The mean height of American men (ages 20-29) is
inches. Random samples of 60 such men are selected. Find the mean and
standard deviation (standard error) of the sampling distribution.
mean
Standard deviation
69.2
33. Interpreting the Central Limit Theorem
The mean height of American men (ages 20-29) is =
69.2”. If a random sample of 60 men in this age group
is selected, what is the probability the mean height for
the sample is greater than 70”? Assume the standard
deviation is 2.9”.
Find the z-score for a sample mean of 70:
standard deviation
mean
Since n > 30 the sampling distribution of will be normal
34. 2.14z
There is a 0.0162 probability that a sample of 60
men will have a mean height greater than 70”.
Interpreting the Central Limit Theorem
35. Application Central Limit Theorem
During a certain week the mean price of gasoline in California was
$1.164 per gallon. What is the probability that the mean price for
the sample of 38 gas stations in California is between $1.169 and
$1.179? Assume the standard deviation = $0.049.
standard deviation
mean
Calculate the standard z-score for sample values of $1.169 and
$1.179.
Since n > 30 the sampling distribution of will be normal
36. .63 1.90
z
Application Central Limit Theorem
P( 0.63 < z < 1.90)
= 0.9713 – 0.7357
= 0.2356
The probability is 0.2356 that the mean for the
sample is between $1.169 and $1.179.
38. Binomial Distribution Characteristics
• There are a fixed number of independent trials. (n)
• Each trial has 2 outcomes, Success or Failure.
• The probability of success on a single trial is p and
the probability of failure is q. p + q = 1
• We can find the probability of exactly x successes out
of n trials. Where x = 0 or 1 or 2 … n.
• x is a discrete random variable representing a count
of the number of successes in n trials.
39. Application
34% of Americans have type A+
blood. If 500 Americans are
sampled at random, what is the probability at least 300 have
type A+
blood?
Using techniques of Chapter 4 you could calculate the
probability that exactly 300, exactly 301… exactly 500
Americans have A+
blood type and add the probabilities.
Or…you could use the normal curve probabilities to
approximate the binomial probabilities.
If np 5 and nq 5, the binomial random variable x is
approximately normally distributed with mean
40. Why Do We Require np 5 and nq 5?
0 1 2 3 4 5
n = 5
p = 0.25, q = .75
np =1.25 nq = 3.75
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
n = 20
p = 0.25
np = 5 nq = 15
n = 50
p = 0.25
np = 12.5
nq = 37.5
0 10 20 30 40 50
41. Binomial Probabilities
The binomial distribution is discrete with a probability
histogram graph. The probability that a specific value of
x will occur is equal to the area of the rectangle with
midpoint at x.
If n = 50 and p = 0.25 find
Add the areas of the rectangles with midpoints at
x = 14, x = 15, x = 16.
14 15 16
0.111 0.089
0.065
0.111 + 0.089 + 0.065 = 0.265
42. 14 15 16
Correction for Continuity
Use the normal approximation to the binomial to
find .
Values for the binomial random variable x
are 14, 15 and 16.
43. 14 15 16
Correction for Continuity
Use the normal approximation to the binomial to
find .
The interval of values under the normal curve is
To ensure the boundaries of each rectangle are
included in the interval, subtract 0.5 from a left-hand
boundary and add 0.5 to a right-hand boundary.
44. Normal Approximation to the Binomial
Use the normal approximation to the binomial to find
Adjust the endpoints to correct for continuity P .
Convert each endpoint to a standard score.
Find the mean and standard deviation using binomial
distribution formulas.
.
45. Application
A survey of Internet users found that 75% favored
government regulations of “junk” e-mail. If 200 Internet
users are randomly selected, find the probability that fewer
than 140 are in favor of government regulation.
Since np = 150 5 and nq = 50 5 use the normal
approximation to the binomial.
The binomial phrase of “fewer than 140” means
0, 1, 2, 3…139.
Use the correction for continuity to translate to the
continuous variable in the interval . Find P( x <
139.5).
46. Application
A survey of Internet users found that 75% favored
government regulations of “junk” e-mail. If 200 Internet
users are randomly selected, find the probability that fewer
than 140 are in favor of government regulation.
Use the correction for continuity P(x < 139.5).
P( z < -1.71) = 0.0436
The probability that fewer than 140 are in favor of
government regulation is approximately 0.0436.
Editor's Notes
Tell students that there are other bell shaped curves. The normal distributions are graphed with specific mathematical functions.
Tell students that there are other bell shaped curves. The normal distributions are graphed with specific mathematical functions. By identifying the points of inflection, students can roughly determine the standard deviation .
Have students find the means of 11, 15.5 and 21 for the top 3 curves. The standard deviation for each is one-half.
For the lower 3 curves the means are 10, 15.5 and 21. The curve with the largest standard deviation is in the center. The one with the smallest is on the right.
The middle curve on top has the same mean but different standard deviation from the middle curve on bottom.
This rule has been discussed earlier. Emphasize that there is still 0.3% of the distribution falling outside the 3 standard deviation limits.
A good chance to review probabilities. Find the probability an assembly time will be between 3.6 and 4.5. Less than 4.5. Greater than 3.3 hours
This concept was introduced in Chapter 2. The z-score is a measure of position.
When each value of a normal distribution is standardized, the standard normal distribution is produced. If students are using tables, they must standardize all values to find probabilities. If students are using a technology tool, this will not be necessary.
As the value of z increases the cumulative area increases to one.
Tell students it is a good idea to sketch the curve and indicate the area to be found.
This is a “less than” example.
This is a “greater than” example. Students must compute the complementary area.
This is a “between” example. Tell students to be sure to subtract the smaller area from the larger area since areas (and probabilities) cannot be negative.
Using the cumulative density function, the calculation of probabilities is greatly simplified to three possibilities. If you are using a 0-to z approach, skip these slides. With technologies use the CDF command to calculate cumulative densities.
Recall that in a discrete probability distribution, we could use the area of the bar in the probability histogram to obtain the probability of the event. Here we can only find the probability that x will lie in a given interval.
The area is the same
Be sure to emphasize that here, the area is given. Tell students to choose the z score closest to the given area. The only exception is if the area falls exactly at the midpoint between two z-scores, use the midpoint of the z=scores.
Because the normal distribution is symmetric, the z scores will have the same absolute value. As a result, you can find one z-score and use its opposite for the other.
Show students that the formula given is equivalent to the z-score formula. Some students prefer to use only one formula and others like to use both. Have students work these through before displaying the answers. Emphasize the meaning of z-scores. A z-score of 2.33 is a 2.33 standard deviations above the mean.
Students find these “cut-off” problems easier if they think in terms of percentiles, which in turn are interpreted as cumulative areas.
Each sample has the same n. Emphasize that sample means will vary from one sample to another but are not expected to be too far from the population mean. Other statistics such as the sample variance have their own sampling distributions that will be studied later.
This theorem is the foundation for inferential statistics. As long as the sample has at least 30 values, the sampling distribution of the mean will be norm.al. The center of the sampling distribution is the same as the center of the distribution of individual values. The variation is smaller. The larger the sample size, the smaller the variation will be.
When the original population is normally distributed, the sample can be any size for a normal sampling distribution.
Although the probability that one man might be more than 70 inches tall is P(z&gt;0.28) = 1-.6103 =.3897, the probability that the mean of a sample of 60 men will be greater than 70 is 0.0162.
The table for calculating probabilities is limited to specific values of p and values of n that do not exceed 20. This application will show how to calculate binomial probabilities when the table cannot be used and the binomial probability formula becomes too tedious. Even technology tools such as Minitab have limitations in calculating these probabilities.
Review the formulas for calculating the mean and standard deviation of a binomial distribution. These must be found in order to specify the normal distribution.
We have to ensure a large enough sample size. The minimum size depends on n and on p as well. When p is closer to .5, the curve is more symmetric and we require a smaller sample to approximate the normal distribution.
The continuous interval from 13.5 to 16.5 has approximately the same area as the rectangles whose centers are 14, 15 and 16.
The continuous interval from 13.5 to 16.5 has approximately the same area as the rectangles whose centers are 14, 15 and 16.
The normal probability can be used to approximate the discrete binomial probability.
Students will agree that it is extremely impractical to use the binomial formulas for calculating the probability of exactly 0, exactly 1, exactly 2…exactly 139 successes. It is often helpful to have students list the possible values (for example 0, 1, 2…139). This helps determine the interval. the reason for not having to adjust the left hand limit is that the area is almost 0 at the extremes of the curve.
Using the TI-83, binomcdf(200, .75, 139) the binomial probability is given as .0453885607.
If n = 2000 however, the TI-83 gives a domain error message. This means to calculate the probability students will need to use the normal approximation for the binomial.
Students will agree that it is extremely impractical to use the binomial formulas for calculating the probability of exactly 0, exactly 1, exactly 2…exactly 139 successes. It is often helpful to have students list the possible values (for example 0, 1, 2…139). This helps determine the interval. the reason for not having to adjust the left hand limit is that the area is almost 0 at the extremes of the curve.
Using the TI-83, binomcdf(200, .75, 139) the binomial probability is given as .0453885607.
If n = 2000 however, the TI-83 gives a domain error message. This means to calculate the probability students will need to use the normal approximation for the binomial.