Irrespective of the shape of the underlying distribution of the population, by increasing the sample size , sample means & proportions will approximate normal distributions if the sample sizes are sufficiently large.
How can I tell the shape of the underlying population?
CHECK FOR NORMALITY:
Use descriptive statistics. Construct stem-and-leaf plots for small or moderate-sized data sets and frequency distributions and histograms for large data sets.
Compute measures of central tendency (mean and median) and compare with the theoretical and practical properties of the normal distribution. Compute the interquartile range. Does it approximate the 1.33 times the standard deviation?
How are the observations in the data set distributed? Do approximately two thirds of the observations lie between the mean and plus or minus 1 standard deviation? Do approximately four-fifths of the observations lie between the mean and plus or minus 1.28 standard deviations? Do approximately 19 out of every 20 observations lie between the mean and plus or minus 2 standard deviations?
Why do I care if X-bar, the sample mean, is normally distributed?
Because I want to use Z scores to analyze sample means.
But to use Z scores, the data must be normally distributed.
That’s where the Central Limit Theorem steps in.
Recall that the Central Limit Theorem states that sample means are normally distributed regardless of the shape of the underlying population if the sample size is sufficiently large.
To determine µX-bar, we would need to randomly draw out all possible samples of the given size from the population, compute the sample means, and average them. This task is unrealistic. Fortunately, µ X-bar equals the population mean µ, which is easier to access.
Likewise, computing the value of σ X-bar , we would have to take all possible samples of a given size from a population, compute the sample means, and determine the standard deviation of sample means. This task is also unrealistic. Fortunately, σ X-bar can be computed by using the population standard deviation divided by the square root of the sample size.
For 95% confidence, α = .05 and α / 2 = .025. The value of Z.025 is found by looking in the standard normal table under .5000 - .025 = .4750. This area in the table is associated with a Z value of 1.96.
An alternate method: multiply the confidence interval, 95% by ½ (since the distribution is symmetric and the intervals are equal on each side of the population mean.
(½) (95%) = .4750 (the area on each side of the mean) has a corresponding Z value of 1.96.
In other words, of all the possible X-bar values along the horizontal axis of the normal distribution curve, 95% of them should be within a Z score of 1.96 from the mean.
A business analyst for cellular telephone company takes a random sample of 85 bills for a recent month and from these bills computes a sample mean of 153 minutes. If the company uses the sample mean of 153 minutes as an estimate for the population mean, then the sample mean is being used as a POINT ESTIMATE . Past history and similar studies indicate that the population standard deviation is 46 minutes.
The value of Z is decided by the level of confidence desired. A confidence level of 95% has been selected.
For the previous 95% confidence interval, the following conclusions are valid:
I am 95% confident that the average length of a call for the population µ, lies between 143.22 and 162.78 minutes.
If I repeatedly obtained samples of size 85, then 95% of the resulting confidence intervals would contain µ and 5% would not. QUESTION : Does this confidence interval [143.22 to 162.78] contain µ? ANSWER : I don’t know. All I can say is that this procedure leads to an interval containing µ 95% of the time.
I am 95% confident that my estimate of µ [namely 153 minutes] is within 9.78 minutes of the actual value of µ. RECALL: 9.78 is the margin of error.
Be Careful! The following statement is NOT true:
“ The probability that µ lies between 143.22 and 162.78 is .95.”
Once you have inserted your sample results into the confidence interval formula, the word PROBABILITY can no longer be used to describe the resulting confidence interval.
Confidence Interval Estimation of the Mean ( σ Un known)
In reality, the actual standard deviation of the population, σ, is usually un known.
Therefore, we use “s” (sample standard deviation) to compute the confidence interval for the population mean, µ.
However, by using “s” in place of σ, the standard normal Z distribution no longer applies.
Fortunately, the t-distribution will work, provided the population we obtain the sample is normally distributed.
To construct a 90% confidence interval to estimate the average amount of extra time per week worked by a manager in the aerospace industry, I assume that comp time is normally distributed in the population.
The sample size is 18, so df = 17.
A 90% level of confidence results in an α / 2 = .05 area in each tail.
From these figures, the aerospace industry could attempt to build a reward system for such extra work or evaluate the regular 40-hour week to determine how to use the normal work hours more effectively and thus reduce comp time.
I own a large equipment rental company and I want to make a quick estimate of the average number of days a piece of ditch digging equipment is rented out per person per time. The company has records of all rentals, but the amount of time required to conduct an audit of all accounts would be prohibitive.
I decide to take a random sample of rental invoices.
Fourteen different rentals of ditch diggers are selected randomly from the files.
Use the following data to construct a 99% confidence interval to estimate the average number of days that a ditch digger is rented and assume that the number of days per rental is normally distributed in the population.
A clothing company produces men’s jeans. The jeans are made and sold with either a regular cut or a boot cut.
In an effort to estimate the proportion of their men’s jeans market in Oklahoma City that is for boot-cut jeans, the analyst takes a random sample of 212 jeans sales from the company’s two Oklahoma City retail outlets.
Only 34 of the sales were for boot-cut jeans.
Construct a 90% confidence interval to estimate the proportion of the population in Oklahoma City who prefer boot-cut jeans.
Know the desired confidence level , which determines the value of Z (the critical value from the standardized normal distribution. Determining the confidence level is subjective.
Know the acceptable sampling error , e. The amount of error that can be tolerated.
Know the standard deviation , σ. If unknown, estimate by:
estimate σ: [σ = range/4] This estimate is derived from the empirical rule stating that approximately 95% of the values in a normal distribution are within +/- 2σ of the mean, giving a range within which most of the values are located.
Suppose the marketing manager wishes to estimate the population mean annual usage of home heating oil to within +/- 50 gallons of the true value, and he wants to be 95% confident of correctly estimating the true mean.
On the basis of a study taken the previous year, he believes that the standard deviation can be estimated as 325 gallons .