Statistical inference is the process of using sample results to draw conclusions about the characteristics of the population.
Inferential statistic enables to estimate unknown population mean or population proportion.
Two types of estimate: point and interval estimate
Confidence interval estimate is a range of numbers, called an interval, constructed around the point estimate.
POINT AND INTERVAL ESTIMATES A confidence interval is a range of values within which the population parameter is expected to occur. The two confidence intervals that are used extensively are the 95% and the 99%. An Interval Estimate states the range within which a population parameter probably lies. A point estimate is a single value (statistic) used to estimate a population value (parameter).
POINT AND INTERVAL ESTIMATES
Factors that determine the width of a confidence interval
The sample size, n The variability in the population, usually estimated by s The desired level of confidence
How the formula derived? It is from z test formula Rearrange the above formula: From z table
8.1 Confidence interval for a mean ( δ known ) or Refer page 327 for example 8.1
EXAMPLE 3 The value of the population mean is not known. Our best estimate of this value is the sample mean of 24.0 hours. This value is called a point estimate. The Dean of the Business School wants to estimate the mean number of hours worked per week by students. A sample of 49 students showed a mean of 24 hours with a standard deviation of 4 hours. What is the population mean?
The confidence limits range from 22.88 to 25.12. 95 percent confidence interval for the population mean About 95 percent of the similarly constructed intervals include the population parameter. 95% 0.25% 0.25% 1.96 -1.96 From z table
8.2 Confidence interval for a mean ( δ unknown ) If population δ unavailable – therefore need to develop a confidence interval estimate of µ using only the sample statistics mean and standard deviation ( and S) Then we use Student’s t distribution instead of Z value. Or refer to table E.3 page 330
Confidence interval for a mean ( δ unknown ) or Example refer page 332 and 333
8.3 Confidence interval estimation for the proportion or Refer to example 8.4 pg 340
EXAMPLE 4 A sample of 500 executives who own their own home revealed 175 planned to sell their homes and retire to Arizona. Develop a 98% confidence interval for the proportion of executives that plan to sell and move to Arizona.
8.4 Determining Sample Size 8.4.1 How to determine the right sample size for the mean? е = the acceptable sampling error 8.4.2 Sample size determination for the proportion
If there is no knowledge about population proportion π , should use π =0.5 for determining the sample size.
To determine the sample size, you must know three factors:
The desired confidence level, which determines the value of Z, the critical value from the standardized normal distribution
The acceptable sampling error
The standard deviation or population proportion
EXAMPLE 6 A consumer group would like to estimate the mean monthly electricity charge for a single family house in July within $5 using a 99 percent level of confidence. Based on similar studies the standard deviation is estimated to be $20.00 . How large a sample is required?
CHAPTER 9 Fundamental of hypothesis testing: one sample tests
WHAT IS A HYPOTHESIS?
Twenty percent of all customers at Bovine’s Chop House return for another meal within a month.
What is a Hypothesis? A statement about the value of a population parameter developed for the purpose of testing. The mean monthly income for systems analysts is $6,325.
WHAT IS HYPOTHESIS TESTING? Hypothesis testing Based on sample evidence and probability theory Used to determine whether the hypothesis is a reasonable statement and should not be rejected, or is unreasonable and should be rejected
Alternative Hypothesis H 1 : A statement that is accepted if the sample data provide evidence that the null hypothesis is false Null Hypothesis H 0 A statement about the value of a population parameter Step One: State the null and alternate hypotheses
3 HYPOTHESES ABOUT MEANS Three possibilities regarding means H 0 : = 0 H 1 : = 0 H 0 : < 0 H 1 : > 0 H 0 : > 0 H 1 : < 0 Step One: State the null and alternate hypotheses The null hypothesis always contains equality.
STEP TWO: SELECT A LEVEL OF SIGNIFICANCE. The probability of rejecting the null hypothesis when it is actually true; the level of risk in so doing. Rejecting the null hypothesis when it is actually true Type 1 error is under your control Accepting the null hypothesis when it is actually false Level of Significance Type I Error Type II Error
RISK TABLE Step Two: Select a Level of Significance. Researcher Null Accepts Rejects Hypothesis H o H o H o is true H o is false Correct decision Type I error Type II Error Correct decision
One tail test Two tail test Level of significant ( α ) Type 1 error Confidence interval Z value for two tail test Z value for one tail test 0.01 99% p= 0.005 z= 2.58 p=0.01 Z=2.33 0.05 95% p=0.025 z = 1.96 p=0.05 Z=1.65 0.10 90% p=0.05 z = 1.65 p=0.10 Z=1.28
STEP THREE: SELECT THE TEST STATISTIC. A value, determined from sample information, used to determine whether or not to reject the null hypothesis. Examples: z, t, F, 2 Test statistic z Distribution as a test statistic The z value is based on the sampling distribution of X, which is normally distributed when the sample is reasonably large (recall Central Limit Theorem).
Step Four: Formulate the decision rule. Critical value: The dividing point between the region where the null hypothesis is rejected and the region where it is not rejected. Sampling Distribution Of the Statistic z , a Right-Tailed Test, .05 Level of Significance
DECISION RULE Reject the null hypothesis and accept the alternate hypothesis if Computed -z < Critical - z or Computed z > Critical z Decision Rule
USING THE P-VALUE IN HYPOTHESIS TESTING If the p -Value is larger than or equal to the significance level, , H 0 is not rejected. p -Value The probability, assuming that the null hypothesis is true, of finding a value of the test statistic at least as extreme as the computed value for the test Calculated from the probability distribution function or by computer Decision Rule If the p -Value is smaller than the significance level, , H 0 is rejected. Refer to example 9.4 page 382
Interpreting p-values SOME evidence H o is not true STRONG evidence H o is not true VERY STRONG evidence H o is not true
ONE-TAILED TESTS OF SIGNIFICANCE One-Tailed Tests of Significance The alternate hypothesis, H 1, states a direction
H 1 : The mean yearly commissions earned by full-time realtors is more than $35,000. (µ>$35,000)
H 1 : The mean speed of trucks traveling on I-95 in Georgia is less than 60 miles per hour. (µ<60) H 1 : Less than 20 percent of the customers pay cash for their gasoline purchase. 20)
ONE-TAILED TEST OF SIGNIFICANCE
Sampling Distribution Of the Statistic z, a Right-Tailed Test, .05 Level of Significance
H 1 : The mean price for a gallon of gasoline is not equal to $1.54.
( µ ne $1.54).
No direction is specified in the alternate hypothesis H 1 .
H 1 : The mean amount spent by customers at the Wal-mart in Georgetown is not equal to $25.
(µ ne $25).
Two-Tailed Tests of Significance
Two-Tailed Tests of Significance Regions of Nonrejection and Rejection for a Two-Tailed Test, .05 Level of Significance
TESTING FOR THE POPULATION MEAN: LARGE SAMPLE, POPULATION STANDARD DEVIATION KNOWN Test for the population mean from a large sample with population standard deviation known
EXAMPLE 1 The processors of Fries’ Catsup indicate on the label that the bottle contains 16 ounces of catsup. The standard deviation of the process is 0.5 ounces. A sample of 36 bottles from last hour’s production revealed a mean weight of 16.12 ounces per bottle. At the .05 significance level is the process out of control? That is, can we conclude that the mean amount per bottle is different from 16 ounces? µ = 16 ounces σ = 0.5 ounces n= 36 ounce α =0.05
State the null and the alternative hypotheses
H 0 : = 16
H 1 : 16
Step 3 Identify the test statistic. Because we know the population standard deviation, the test statistic is z. Step 2 Select the significance level. The significance level is .05. Step 4 State the decision rule. Reject H 0 if z > 1.96 or z < -1.96 or if p < .05. Step 5 Make a decision and interpret the results. Two tail test
Computed z of 1.44
< Critical z of 1.96,
p of .1499 > of .05,
Do not reject the null hypothesis.
The p( z > 1.44) is .1499 for a two-tailed test. Step 5: Make a decision and interpret the results . We cannot conclude the mean is different from 16 ounces.
TESTING FOR THE POPULATION MEAN: LARGE SAMPLE, POPULATION STANDARD DEVIATION UNKNOWN Testing for the Population Mean: Large Sample, Population Standard Deviation Unknown Here is unknown, so we estimate it with the sample standard deviation s . As long as the sample size n > 30, z can be approximated using
EXAMPLE 2 Roder’s Discount Store chain issues its own credit card. Lisa, the credit manager, wants to find out if the mean monthly unpaid balance is more than $400. The level of significance is set at .05. A random check of 172 unpaid balances revealed the sample mean to be $407 and the sample standard deviation to be $38. Should Lisa conclude that the population mean is greater than $400, or is it reasonable to assume that the difference of $7 ($407-$400) is due to chance?
EXAMPLE 2 Step 1 H 0 : µ < $400 H 1 : µ > $400 Step 2 The significance level is .05. Step 3 Because the sample is large we can use the z distribution as the test statistic. Step 4 H 0 is rejected if z > 1.65 or if p < .05. Step 5 Make a decision and interpret the results.
The p( z > 2.42) is .0078 for a one-tailed test.
Computed z of 2.42
> Critical z of 1.65,
p of .0078 < of .05.
Reject H 0 .
Step 5 Make a decision and interpret the results. Lisa can conclude that the mean unpaid balance is greater than $400.
TESTING FOR A POPULATION MEAN: SMALL SAMPLE, POPULATION STANDARD DEVIATION UNKNOWN The critical value of t is determined by its degrees of freedom equal to n -1. Testing for a Population Mean: Small Sample, Population Standard Deviation Unknown The test statistic is the t distribution.
EXAMPLE 3 The current rate for producing 5 amp fuses at Neary Electric Co. is 250 per hour. A new machine has been purchased and installed that, according to the supplier, will increase the production rate. The production hours are normally distributed. A sample of 10 randomly selected hours from last month revealed that the mean hourly production on the new machine was 256 units, with a sample standard deviation of 6 per hour. At the .05 significance level can Neary conclude that the new machine is faster?
Step 4 State the decision rule. There are 10 – 1 = 9 degrees of freedom. Step 1 State the null and alternate hypotheses. H 0 : µ < 250 H 1 : µ > 250 Step 2 Select the level of significance. It is .05 . Step 3 Find a test statistic. Use the t distribution since is not known and n < 30. The null hypothesis is rejected if t > 1.833 or, using the p-value, the null hypothesis is rejected if p < .05.
Computed t of 3.162
>Critical t of 1.833
p of .0058 < a of .05
Reject H o
The p( t >3.162) is .0058 for a one-tailed test. Step 5 Make a decision and interpret the results. The mean number of amps produced is more than 250 per hour.
Extracted from excel t test for one tails Null hypothesis µ 250 Level of significance α 0.05 Sample size n 10 Sample mean X 256 Sample SD s 6 Intermediate calculation Std error of the mean 1.8974 Degree of freedom 9 t test statistic 3.1623 One Tail test Lower critical value -1.8331 p-Value 0.0058 Reject the null hypothesis
The sample proportion is p and is the population proportion. The fraction or percentage that indicates the part of the population or sample having a particular trait of interest. Proportion Test Statistic for Testing a Single Population Proportion
EXAMPLE 4 In the past, 15% of the mail order solicitations for a certain charity resulted in a financial contribution. A new solicitation letter that has been drafted is sent to a sample of 200 people and 45 responded with a contribution. At the .05 significance level can it be concluded that the new letter is more effective?
EXAMPLE 4 Step 1 State the null and the alternate hypothesis. H 0: p < .15 H 1: p > .15 Step 2 Select the level of significance. It is .05. Step 3 Find a test statistic. The z distribution is the test statistic. Step 4 State the decision rule. The null hypothesis is rejected if z is greater than 1.65 or if p < .05. Step 5 Make a decision and interpret the results.
EXAMPLE 4 Because the computed z of 2.97 > critical z of 1.65, the p of .0015 < of .05, the null hypothesis is rejected. More than 15 percent responding with a pledge. The new letter is more effective. p( z > 2.97) = .0015. Step 5: Make a decision and interpret the results.
CHAPTER 10 TWO SAMPLES TESTS
COMPARING TWO POPULATIONS Does the distribution of the differences in sample means have a mean of 0? Comparing two populations If both samples contain at least 30 observations we use the z distribution as the test statistic. No assumptions about the shape of the populations are required. The samples are from independent populations. The formula for computing the value of z is:
EXAMPLE 1 with a standard deviation of $7,000 for a sample of 35 households. At the .01 significance level can we conclude the mean income in Bradford is more? Two cities, Bradford and Kane are separated only by the Conewango River. There is competition between the two cities. The local paper recently reported that the mean household income in Bradford is $38,000 with a standard deviation of $6,000 for a sample of 40 households. The same article reported the mean income in Kane is $35,000
EXAMPLE 1 CONTINUED Step 2 State the level of significance. The .01 significance level is stated in the problem. Step 3 Find the appropriate test statistic. Because both samples are more than 30, we can use z as the test statistic. Step 1 State the null and alternate hypotheses. H 0 : µB < µK H 1 : µB > µK Step 4 State the decision rule. The null hypothesis is rejected if z is greater than 2.33 or p < .01.
Step 5: Compute the value of z and make a decision. The p(z > 1.98) is .0239 for a one-tailed test of significance. Because the computed Z of 1.98 < critical Z of 2.33, the p-value of .0239 > of .01, the decision is to not reject the null hypothesis. We cannot conclude that the mean household income in Bradford is larger.
Two Sample Tests of Proportions investigate whether two samples came from populations with an equal proportion of successes. The two samples are pooled using the following formula. where X 1 and X 2 refer to the number of successes in the respective samples of n 1 and n 2 . The value of the test statistic is computed from the following formula. where X 1 and X 2 refer to the number of successes in the respective samples of n 1 and n 2 .
EXAMPLE 2 Are unmarried workers more likely to be absent from work than married workers? A sample of 250 married workers showed 22 missed more than 5 days last year, while a sample of 300 unmarried workers showed 35 missed more than five days. Use a .05 significance level.
EXAMPLE 2 CONTINUED
The null and the alternate hypotheses
H 0 : U < M H 1 : U > M
The null hypothesis is rejected if the computed value of z is greater than 1.65 or the p-value < .05. The pooled proportion = .1036
EXAMPLE 2 CONTINUED The p(z > 1.10) = .136 for a one-tailed test of significance. Because a calculated z of 1.10 < a critical z of 1.96, p of .136 > of .05, the null hypothesis is not rejected. We cannot conclude that a higher proportion of unmarried workers miss more days in a year than the married workers.
SMALL SAMPLE TESTS OF MEANS The required assumptions 1. Both populations must follow the normal distribution. 2. The populations must have equal standard deviations. 3. The samples are from independent populations. Small Sample Tests of Means The t distribution is used as the test statistic if one or more of the samples have less than 30 observations.
SMALL SAMPLE TEST OF MEANS CONTINUED Step Two : Determine the value of t from the following formula. Finding the value of the test statistic requires two steps. Step One : Pool the sample standard deviations.
EXAMPLE 3 A recent EPA study compared the highway fuel economy of domestic and imported passenger cars. A sample of 15 domestic cars revealed a mean of 33.7 mpg with a standard deviation of 2.4 mpg. A sample of 12 imported cars revealed a mean of 35.7 mpg with a standard deviation of 3.9. At the .05 significance level can the EPA conclude that the mpg is higher on the imported cars?
EXAMPLE 3 CONTINUED Step 1 State the null and alternate hypotheses. H 0 : µ D > µ I H 1 : µ D < µ I Step 2 State the level of significance. The .05 significance level is stated in the problem. Step 3 Find the appropriate test statistic. Both samples are less than 30, so we use the t distribution.
Step 4 The decision rule is to reject H 0 if t<- 1.708 or if p-value < .05. There are n-1 or 25 degrees of freedom. Step 5 We compute the pooled variance.
EXAMPLE 3 CONTINUED We compute the value of t as follows.
EXAMPLE 3 CONTINUED Since a computed z of –1.64 > critical z of –1.71, the p-value of .0567 > of .05 , H 0 is not rejected. There is insufficient sample evidence to claim a higher mpg on the imported cars. P(t < -1.64) = .0567 for a one-tailed t-test.
HYPOTHESIS TESTING INVOLVING PAIRED OBSERVATIONS Dependent samples are samples that are paired or related in some fashion. Independent samples are samples that are not related in any way. If you wished to buy a car you would look at the same car at two (or more) different dealerships and compare the prices. If you wished to measure the effectiveness of a new diet you would weigh the dieters at the start and at the finish of the program. Town and Country Cadillac Downtown Cadillac
HYPOTHESIS TESTING INVOLVING PAIRED OBSERVATIONS Use the following test when the samples are dependent : where is the mean of the differences S d is the standard deviation of the differences n is the number of pairs (differences)
EXAMPLE 4 An independent testing agency is comparing the daily rental cost for renting a compact car from Hertz and Avis. A random sample of eight cities revealed the following information. At the .05 significance level can the testing agency conclude that there is a difference in the rental charged? City Hertz ($) Avis ($) Atlanta 42 40 Chicago 56 52 Cleveland 45 43 Denver 48 48 Honolulu 37 32 Kansas City 45 48 Miami 41 39 Seattle 46 50
EXAMPLE 4 CONTINUED Step 4 H 0 is rejected if t < -2.365 or t > 2.365; or if p-value < .05. We use the t distribution with n-1 or 7 degrees of freedom. Step 2 The stated significance level is .05. Step 3 The appropriate test statistic is the paired t-test. Step 1 H o : d = 0 H 1 : d = 0 Step 5 Perform the calculations and make a decision.
EXAMPLE 4 CONTINUED
City Hertz Avis d d 2
Atlanta 42 40 2 4
Chicago 56 52 4 16
Cleveland 45 43 2 4
Denver 48 48 0 0
Honolulu 37 32 5 25
Kansas City 45 48 -3 9
Miami 41 39 2 4
Seattle 46 50 -4 16
EXAMPLE 4 CONTINUED
EXAMPLE 4 CONTINUED P(t>.894) = .20 for a one-tailed t-test at 7 degrees of freedom. Because 0.894 is less than the critical value, the p-value of .20 > a of .05, do not reject the null hypothesis. There is no difference in the mean amount charged by Hertz and Avis.
COMPARING DEPENDENT AND INDEPENDENT SAMPLES Advantage of dependent samples: Reduction in variation in the sampling distribution Disadvantage of dependent samples: Degrees of freedom are halved The same subjects measured at two different points in time. Two types of dependent samples Matched or paired observations