0
Upcoming SlideShare
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Standard text messaging rates apply

# A04 Sample Size

1,225

Published on

1 Like
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

Views
Total Views
1,225
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
94
0
Likes
1
Embeds 0
No embeds

No notes for slide
• Power and Sample Size Test for One Proportion Testing proportion = 0.1 (versus not = 0.1) Alpha = 0.05 Alternative Sample Target Proportion Size Power Actual Power 0.12 2523 0.9 0.900079 0.15 438 0.9 0.900409 0.20 122 0.9 0.901723 0.25 59 0.9 0.903729
• ### Transcript

• 1. Sample Size Determination Deliverable 10A
• 2. Analyze Module Roadmap Define 1D – Define VOC, VOB, and CTQ’s 2D – Define Project Boundaries 3D – Quantify Project Value 4D – Develop Project Mgmt. Plan Measure 5M – Document Process 6M – Prioritize List of X’s 7M – Create Data Collection Plan 8M – Validate Measurement System 9M – Establish Baseline Process Cap. Analyze 10A – Determine Critical X’s Improve 12I – Prioritized List of Solutions 13I – Pilot Best Solution Control 14C – Create Control System 15C – Finalize Project Documentation Green 11G – Identify Root Cause Relationships Queue 1 Queue 2
• 3. Objectives – Sample Size <ul><li>Upon completion of this module, the student should be able to: </li></ul><ul><ul><li>List and define the variables which contribute to determining the correct sample size. </li></ul></ul><ul><ul><li>Calculate the appropriate sample size for a defined set of variables </li></ul></ul>
• 4. Key Variables in Sample Size <ul><ul><li>An optimal sample size is determined by four key factors: </li></ul></ul><ul><ul><ul><li>Alpha risk (  ): The maximum risk the business is willing to take of rejecting the null hypothesis when it is true </li></ul></ul></ul><ul><ul><ul><li>Beta risk (  ): The risk level of failing to reject the null hypothesis when it is false </li></ul></ul></ul><ul><ul><ul><li>Delta or difference (  ): The minimum difference we want to detect between populations </li></ul></ul></ul><ul><ul><ul><li>Proportion (p) or Standard Deviation (s): </li></ul></ul></ul><ul><ul><ul><ul><li>Proportion - Your best estimate of the defect rate with discrete data </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Standard Deviation - Your best estimate from available continuous data </li></ul></ul></ul></ul>
• 5. Alpha Risk (  ) <ul><ul><li>Alpha risk is decided by the Black Belt </li></ul></ul><ul><ul><li>Our choice of  will determine when to reject the null hypothesis </li></ul></ul><ul><ul><ul><li>Typical  values for general business applications are between 0.05 and 0.10. As the cost of incorrect conclusions go up, you may choose to lower  . </li></ul></ul></ul><ul><ul><ul><ul><li>e.g. Pharmaceutical companies have tremendous risk to consumer health issues and often use an  of 0.01 </li></ul></ul></ul></ul><ul><ul><ul><li>The  value should depend on practical considerations such as financial or safety risk, or risk to the customer </li></ul></ul></ul><ul><ul><li>“ Significance” is defined as 1-  </li></ul></ul>
• 6. Beta Risk (  ) <ul><ul><li>Beta risk can be selected by the Black Belt, but we don‘t control it the same way we do  risk. The best we can do is adjust sample size so that  is no greater than a specified value. </li></ul></ul><ul><ul><li>When a beta error (  ) occurs, we have missed detecting a difference (good or bad). </li></ul></ul><ul><ul><li>Power is defined as (1 -  ). It represents the probablitily that we can detect an important effect in the process </li></ul></ul><ul><ul><ul><li>Typical values of power in experiments are between 0.80 to 0.90 </li></ul></ul></ul><ul><ul><ul><li>We will use 0.90 for most work at JEA </li></ul></ul></ul>
• 7. Delta (  ) <ul><ul><li>Delta (  ) is the minimum change that needs to be detected during analysis </li></ul></ul><ul><ul><ul><li>Example: if the average cycle time to perform a laboratory test was 120 minutes, you as the supervisor may not be concerned if the average time shifted to 121 minutes, but you would want to know if it increased to 130 minutes. In this case, 10 minutes is the smallest increment of concern (  = 10 minutes). </li></ul></ul></ul><ul><ul><li>It is the acceptable window of uncertainty around the estimate </li></ul></ul><ul><ul><li>As delta decreases (more precision), the sample size increases </li></ul></ul><ul><ul><li>As delta increases (less precision), the sample size decreases </li></ul></ul>
• 8. Signal to Noise Ratio <ul><ul><li>If you consider  and  , the ratio of the two is much like a signal-to-noise ratio </li></ul></ul><ul><ul><li>If the “signal” is large relative to the noise, we can “hear” the signal </li></ul></ul><ul><ul><li>Sample size will increase dramatically as the  ratio drops </li></ul></ul>Low  High  
• 9. Minitab Versus Excel <ul><ul><li>Minitab uses an “infinite population” approach </li></ul></ul><ul><ul><ul><li>Minitab calculators assume the population is relatively infinite </li></ul></ul></ul><ul><ul><ul><li>Relatively infinite means the population is at least ten times larger than the sample used </li></ul></ul></ul><ul><ul><ul><li>Predicts a “safe” sample size (larger than a finite population approach) </li></ul></ul></ul><ul><ul><li>Excel calculators are able to use a “finite population” approach </li></ul></ul><ul><ul><ul><li>They have a “finite population correction factor” </li></ul></ul></ul><ul><ul><ul><li>Adjusts the sample size to account for when we are sampling a significant portion of the population </li></ul></ul></ul>
• 10. Calculating Sample Size in Minitab <ul><ul><li>Stat>Power and Sample Size>{Select as needed} </li></ul></ul>Enter multiple values with a space between values for any/all of these (Minitab will calculate the value for the third parameter)
• 11. Wastewater Sample Size Example <ul><ul><li>You are going to perform a statistical test to determine if there is a difference in the average suspended solids level for two processing lines at a wastewater treatment plant. A suspended solids difference of 10 units or less is unimportant to you for the purpose of this test, but you would like to detect a difference > 10. The historical process standard deviation is 5. </li></ul></ul>
• 12. Wastewater Sample Size Example Stat > power and sample size > 2-Sample t
• 13. Minitab Output Power and Sample Size 2-Sample t Test Testing mean 1 = mean 2 (versus not =) Calculating power for mean 1 = mean 2 + difference Alpha = 0.05 Assumed standard deviation = 5 Sample Target Difference Size Power Actual Power 10 7 0.9 0.929070 The sample size is for each group.
• 14. Wastewater Sample Size – Pt. 2 <ul><ul><li>“ Wow! Seven samples are not that many. I was prepared to gather 25 samples. How small of a difference can I detect if I collect 10,15, 20 or the entire 25 samples”? </li></ul></ul>Power and Sample Size 2-Sample t Test Testing mean 1 = mean 2 (versus not =) Calculating power for mean 1 = mean 2 + difference Alpha = 0.05 Assumed standard deviation = 5 Sample Size Power Difference 10 0.9 7.66846 15 0.9 6.13222 20 0.9 5.25996 25 0.9 4.67878 The sample size is for each group. Notice how sample size increases dramatically as the difference to detect becomes smaller and smaller.
• 15. Class Exercise <ul><ul><li>Recalculate the sample size for the previous problem using a 1%  and a 0.80 power. </li></ul></ul>10 min
• 16. Homework - Back to Pat’s Invoice Problem <ul><ul><li>Our old friend Pat is starting to wonder about the validity of a great number of past decisions. In this case, Pat now realizes that the past practice of guessing at the number of invoices to inspect (as was done in previous modules) wasn’t the most reliable. How many data points will Pat need to inspect to rule in/out that the process does not have a 10% defect rate if the samples inspected had a 12%, 15%, 20%, or 25% defect rate? </li></ul></ul>
• 17. Selecting Data for the Stat Test <ul><ul><li>Now that we know how many data points to include in the statistical test, we need to identify which samples should be placed in the test. </li></ul></ul><ul><ul><li>Assume you have several hundred data points collected over time, but the sample size calculation showed you need only 35 for the statistical test. How do we pick the appropriate 35? </li></ul></ul><ul><ul><ul><li>The 35 “best” or “worst” will certainly skew our conclusions </li></ul></ul></ul><ul><ul><ul><li>35 from the center of the data will not show the appropriate variability </li></ul></ul></ul><ul><ul><li>Let’s have Minitab do it for us! </li></ul></ul>
• 18. Generating Random Data <ul><ul><li>Use Minitab to generate 300 randomly distributed data points having a mean of 100 and a standard deviation of 10. </li></ul></ul><ul><ul><ul><li>Calc>Random Data>Normal </li></ul></ul></ul>
• 19. Selecting Data at Random <ul><ul><li>Use the following to select 35 random data points </li></ul></ul><ul><ul><ul><li>Calc>Random Data>Sample from Columns </li></ul></ul></ul>
• 20. Randomly Selected Data This procedure works equally well with text or numerical values (a wonderful way to select the sequence for Black Belts to present their projects in class).
• 21. Learning Check – Sample Size <ul><li>Upon completion of this module, the student should be able to: </li></ul><ul><ul><li>List and define the variables which contribute to determining the correct sample size. </li></ul></ul><ul><ul><li>Calculate the appropriate sample size for a defined set of variables </li></ul></ul>