JM Statr session 13, Jan 11

480 views

Published on

Praxis Weekend Business Analytics

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
480
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
19
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

JM Statr session 13, Jan 11

  1. 1. Learning Objectives • Estimate a population mean from a sample mean when is known. • Estimate a population mean from a sample mean when is unknown. • Estimate a population proportion using the z statistic. • Use the chi-square distribution to estimate the population variance given the sample variance. • Determine the sample size needed in order to estimate the population mean and population proportion.
  2. 2. Estimating the Population Parameter • A point estimate is a statistic calculated from a sample that is used to estimate a population parameter. • Interval estimate - a range of values within which the analyst can declare, with some confidence, the population parameter lies.
  3. 3. Point Estimate of μ • Point estimate x x n • Point estimate is also called Estimator • Varies from sample to sample
  4. 4. Interval Estimate of μ • Because of variation in sample statistics, a population parameter is estimated using an Interval Estimate • An interval estimate (confidence interval) is a range of values within which the researcher feels, with some confidence, that the population mean lies
  5. 5. Estimating the Population Mean using Interval Estimate z x n
  6. 6. Finding out z value for 95% Confidence Interval
  7. 7. A 95% Confidence Interval for Population Parameter .025 .025 95% .4750 .4750 X -1.96 0 1.96 Z
  8. 8. Significance of Level of Confidence • What does the Level of Confidence to be 95%/ mean? • It means that if the research analyst were to randomly select 100 samples of some size n and use the result i.e. calculated sample mean to construct a 95% confidence interval, approximately 95 of the 100 confidence intervals would contain the population mean. • You will try out a practical example using R.
  9. 9. 95% Confidence Intervals for μ 95% X X X X X X X
  10. 10. Values of z for common Levels of Confidence Confidence Level 90% 95% 98% 99% Z Value 1.645 1.96 2.33 2.575 Think: What happens to the length of Confidence Interval as the Confidence Level increases?
  11. 11. 95% Confidence Intervals for μ x 1300, x z 160, n 85, z /2 n 46 1300 1.96 85 1300 34.01 1265.99 x z /2 1.96 /2 n 46 1300 1.96 85 1300 34.01 1334.01
  12. 12. Demonstration Problem 8.1 • A survey was taken of U.S. companies that do business with firms in India. One of the questions on the survey was: Approximately how many years has your company been trading with firms in India? A random sample of 44 responses to this question yielded a mean of 10.455 years. Suppose the population standard deviation for this question is 7.7 years. Using this information, construct a 90% confidence interval for the mean number of years that a company has been trading in India for the population of U.S. companies trading with firms in India.
  13. 13. Demonstration Problem 8.1: Solution x 10 .455 , 7.7, n 44 . 90 % confidence z 1.645 x z n 7.7 10.455 1.645 44 10.455 1.91 8.545 x z n 7.7 10.455 1.645 44 10.455 1.91 12.365
  14. 14. Demonstration Problem 8.2 • A study is conducted in a company that employs 800 engineers. A random sample of 50 engineers reveals that the average sample age is 34.3 years. Historically, the population standard deviation of the age of the company’s engineers is approximately 8 years. Construct a 98% confidence interval to estimate the average age of all the engineers in this company.
  15. 15. Demonstration Problem 8.2: Solution x 34 .3, 8, N = 800 , and n 50 . 98 % confidence z 2.33 x z n N n N 1 8 800 50 34.3 2.33 50 800 1 34.3 2.554 31.75 x z n N n N 1 8 800 50 34.3 2.33 50 800 1 34.3 2.554 36.85
  16. 16. What is t distribution? • A family of distributions -- a unique distribution for each value of its parameter, degrees of freedom (d.f.) • t distribution is used instead of the z distribution for doing inferential statistics on the population mean when the population Standard Deviation is unknown and the population is normally distributed • With the t distribution, you use the Sample Standard Deviation, s
  17. 17. t Distribution A family of distributions - a unique distribution for each value of its parameter using degrees of freedom (d.f.), every sample size having a different distribution t x s n
  18. 18. t Distribution Characteristics • t distribution – symmetric, unimodal, mean = 0, flatter in middle and have more area in their tails than the normal distribution • t distribution approaches the normal curve as n becomes larger • t distribution is to be used when the Population Variance or Population Standard Deviation is unknown, regardless of the size of the sample
  19. 19. Robustness of t Distribution • Most statistical techniques have one or more underlying assumptions • If a technique is relatively insensitive to minor violations in one or more assumptions, the technique is said to be robust to that assumption. • t statistic for estimating a population mean is relatively robust to the assumption that the population is normally distributed
  20. 20. Reading the t Distribution table
  21. 21. t statistic: Degrees of Freedom (df) • For t statistic, df is n-1 • Degree of Freedom refers to the number of independent observations for a source of variation minus the number of independent parameters estimated in computing the variation • Number of independent observations = n • One independent parameter, population mean μ, is being estimated
  22. 22. Confidence Intervals for μ of a Normal Population: Unknown σ / 2,n 1 s n x t / 2,n 1 s n df n 1 x t or x t / 2,n 1 s n
  23. 23. Table of Critical Values of t df 1 2 3 4 5 t0.100 t0.050 t0.025 t0.010 t0.005 3.078 1.886 1.638 1.533 1.476 6.314 2.920 2.353 2.132 2.015 12.706 4.303 3.182 2.776 2.571 31.821 6.965 4.541 3.747 3.365 63.656 9.925 5.841 4.604 4.032 1.714 25 1.319 1.318 1.316 1.708 2.069 2.064 2.060 2.500 2.492 2.485 2.807 2.797 2.787 29 30 1.311 1.310 1.699 1.697 2.045 2.042 2.462 2.457 2.756 2.750 40 60 120 1.303 1.296 1.289 1.282 1.684 1.671 1.658 1.645 2.021 2.000 1.980 1.960 2.423 2.390 2.358 2.327 2.704 2.660 2.617 2.576 23 24 1.711 t With df = 24 and t = 1.711. = 0.05,
  24. 24. Demonstration Problem 8.3 • The owner of a large equipment rental company wants to make a rather quick estimate of the average number of days a piece of ditch digging equipment is rented out per person per time. The company has records of all rentals, but the amount of time required to conduct an audit of all accounts would be prohibitive. The owner decides to take a random sample of rental invoices. Fourteen different rentals of ditch diggers are selected randomly from the files, yielding the following data. She uses these data to construct a 99% confidence interval to estimate the average number of days that a ditch digger is rented and assumes that the number of days per rental is normally distributed in the population. • Data: 3 1 3 2 5 1 2 1 4 2 1 3 1 1
  25. 25. Solution to Demonstration Problem 8.3 x 2.14 , s 1.29 , n 14 , df n 1 13 1 .99 0.005 2 2 t .005,13 3.012 s x t n 1.29 2.14 3.012 14 2.14 1.04 1.10 s x t n 1.29 2.14 3.012 14 2.14 1.04 3.18
  26. 26. Confidence Interval to Estimate the Population Proportion ˆ p z 2 ˆ ˆ p q n p ˆ p z 2 where : ˆ p = sample proportion ˆ ˆ q =1 p p = population proportion n = sample size ˆ ˆ p q n
  27. 27. Demonstration Problem 8.5 A clothing company produces men’s jeans. The jeans are made and sold with either a regular cut or a boot cut. In an effort to estimate the proportion of their men’s jeans market in Oklahoma City that prefers boot-cut jeans, the analyst takes a random sample of 423 jeans sales from the company’s two Oklahoma City retail outlets. Only 72 of the sales were for boot-cut jeans. Construct a 90% confidence interval to estimate the proportion of the population in Oklahoma City who prefer boot-cut jeans.
  28. 28. Solution for Demonstration Problem 8.5 n ˆ 72, p 423, x ˆ ˆ q =1 p 1 0.17 90% Confidence ˆ p z ˆˆ pq n (0.17)(0.83) 0.17 1.645 423 0.17 0.03 0.14 p p p p x 72 0.17 n 423 0.83 z 1.645 ˆ p z ˆˆ pq n (0.17)(0.83) 0.17 1.645 423 0.17 0.03 0.20
  29. 29. Estimating Population Variance Population Parameter Estimator of s2 formula for Single Variance 2 ( x x) 2 n 1 (n 1) s 2 2 degrees of freedom n 1
  30. 30. Chi-square statistic to estimate Population Variance • Extremely sensitive to the violations of the assumption that the population is normally distributed • This technique lacks robustness • Take extreme caution while constructing confidence interval
  31. 31. n 1s 2 2 df 2 2 n 1s 2 2 1 2 n 1 1 level of confidence
  32. 32. Two Table Values of χ2 df = 7 df 1 2 3 4 5 6 7 8 9 10 .05 .95 .05 0 2 4 6 8 2.16735 10 12 14 16 18 20 14.0671 0.950 3.93219E-03 0.102586 0.351846 0.710724 1.145477 1.63538 2.16735 2.73263 3.32512 3.94030 0.050 3.84146 5.99148 7.81472 9.48773 11.07048 12.5916 14.0671 15.5073 16.9190 18.3070 20 21 22 23 24 25 10.8508 11.5913 12.3380 13.0905 13.8484 14.6114 31.4104 32.6706 33.9245 35.1725 36.4150 37.6525
  33. 33. Exercise in R: Confidence Intervals Open URL: www.openintro.org Go to Labs in R and select 4B - Confidence Levels

×