Statr sessions 11 to 12


Published on

Praxis Weekend Business Analytics

Published in: Education, Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Statr sessions 11 to 12

  1. 1. Learning Objectives • Determine when to use sampling. • Determine the pros and cons of various sampling techniques. • Be aware of the different types of errors that can occur in a study. • Understand the impact of the Central Limit Theorem on statistical analysis. • Use the sampling distributions of the sample mean and sample proportion.
  2. 2. Reasons for Sampling • Sampling – A means for gathering information about a population without conducting a census – Information is gathered from sample, and inference is made about the population • Sampling has advantages over a census – Sampling can save money. – Sampling can save time.
  3. 3. Random versus non-random Sampling • Nonrandom Sampling - Every unit of the population does not have the same probability of being included in the sample • Random sampling - Every unit of the population has the same probability of being included in the sample.
  4. 4. Sampling from a Frame • A sample is taken from a population list, map , directory, or other source used to represent the population, which is called a frame. • Frames can be Telephone Directory, School lists, trade association lists, or even lists sold by brokers. • In theory, the target population and the frame are same. But in reality, frames may have over-registration or under-registration.
  5. 5. Random Sampling Techniques • Simple Random Sampling – basis for other random sampling techniques – Each unit is numbered from 1 to N (the size of the population) – A random number generator can be used to select n items that form the sample – Easier to perform on small populations. The process of numbering all members of a population is cumbersome for large populations
  6. 6. Random Sampling Techniques • Systematic Random Sampling – Every kth item is selected to produce a sample of size n from a population of size N – Value of k is called sampling cycle – Define k = N/n. Choose one random unit from first k units, and then select every kth unit from there – Used because of convenience and relative ease of administration – A knowledgeable person can easily determine whether a sampling plan has been followed.
  7. 7. Systematic Random Sampling: Example • Purchase orders for the previous fiscal year are serialized 1 to 10,000 (N = 10,000). • A sample of fifty (n = 50) purchases orders is needed for an audit. • k = 10,000/50 = 200
  8. 8. Systematic Sampling: Example • First sample element randomly selected from the first 200 purchase orders. Assume the 45th purchase order was selected. • Subsequent sample elements: 45, 245, 445, 645, . . .
  9. 9. Random Sampling Techniques • Systematic Random Sampling: Problems – Problems can occur if the data are subject to any periodicity and the sampling interval is in syncopation with it, and sampling will be nonrandom – Example: a list of 150 college students, actually a merged list of 5 classes with 30 students in each class, the list in each class being ordered with names of top students first and bottom students last. Systematic sampling of every 30th student may cause selection of all top or bottom or mediocre students i.e. the list is subject to cyclical organizations
  10. 10. Random Sampling Techniques • Stratified Random Sampling – The population is broken down into strata i.e. homogeneous segments with like characteristics (i.e. men and women OR old, young, and middle-aged people, OR high-income, mid-income and low-income group ) and then Simple/Systematic Random Sampling is done. – Efficient when differences between strata exist – The technique capitalizes on the known homogeneity of subpopulations so that only relatively small samples are required to estimate the characteristic for each stratum or group – Proportionate (% of the sample from each stratum equals % that subpopulation of each stratum is within the whole population)
  11. 11. Random Sampling Techniques • Cluster (or Area) Sampling – The population is in pre-determined clusters (students in classes, colleges, towns, companies, areas of a city, geographic regions etc.) – The technique identifies clusters that tend to be internally heterogeneous – Each cluster contains a wide variety of elements, and is miniature of the population – A random sample of clusters is chosen and all or some units within the cluster is used as the sample – Advantages: Convenience and Cost, Convenient to obtain and cost of sampling is reduced as the scope of study is reduced to clusters
  12. 12. Random Sampling Techniques Important to remember: in Stratified Random Sampling, each stratum is a homogeneous group of population in Cluster Sampling, each cluster is a heterogeneous group of population
  13. 13. Convenience (NonRandom) Sampling • Non-Random sampling – sampling techniques used to select elements from the population by any mechanism that does not involve a random selection process – These techniques are not desirable for making statistical inferences – Example – choosing members of this class as an accurate representation of all students at our university, selecting the first five people that walk into a store and ask them about their shopping preferences, etc.
  14. 14. Non-sampling Errors • Non-sampling Errors – all errors that exist other than the variation expected due to random sampling – Missing data, data entry, and analysis errors – Leading questions, poorly conceived concepts, unclear definitions, and defective questionnaires – Response errors occur when people do not know, will not say, or overstate in their answers
  15. 15. Proper analysis and interpretation of a sample statistic requires knowledge of its distribution. Process of Inferential Statistics Select a random sample
  16. 16. What is a Sampling Distribution? • Recall that Statistic has a numerical value that can be computed (observed) once a sample data set is available. • Three points are crucial in this context:  Because a sample is only a part of the population, the numerical value of a statistic cannot be expected to give us the exact value of the parameter  The observed value of a statistic depends on the particular sample that happens to be selected  There will be some variability in the observed values of a statistic over different occasions of sampling
  17. 17. What is a Sampling Distribution? • The value of a Statistic varies in repeated sampling. • In other words, a Statistic is a random variable and hence has its own probability distribution • Sampling Distribution is the Probability Distribution of a Statistic • The qualifier Sampling indicates that the distribution is conceived in the context of repeated sampling from a population • The qualifier is often dropped to say the distribution of a statistic
  18. 18. Statistic and Sampling Distribution • In any given situation, we are often limited to one sample and the corresponding single observed value of a statistic • However, over different samples the statistic varies according to its sampling distribution • The sampling distribution of a statistic is determined - from the probability distribution f(x) that governs the population - sample size n
  19. 19. Central Limit Theorem • Consider taking a sample of size n from a population • The sampling distribution of the sample mean is the distribution of the means of repeated samples of size n from a population • The central limit theorem states that as the sample size increases,  The shape of the distribution becomes a normal distribution (this condition is typically considered to be met when n is at least 30)  The variance decreases by a factor of n
  20. 20. Sampling from a Normal Population The distribution of sample means is normal for any sample size.
  21. 21. z Formula for Sample Means The distribution of sample means is normal for any sample size.
  22. 22. Tyre Store Example Suppose that the mean expenditure per customer at a tyre store is $85.00, with a standard deviation of $9.00. If a random sample of 40 customers is taken, what is the probability that the sample average expenditure per customer for this sample will be $87.00 or more? Solution: Because the sample size is greater than 30, the central limit theorem can be used to state that the sample mean is normally distributed and the problem can proceed using the normal distribution calculations.
  23. 23. Solution to Tyre Store Example
  24. 24. Graphic Solution to Tyre Store Example 9 X 1 40 .5000 .5000 1 . 42 .4207 .4207 85 Z= X- 87 85 9 n 40 87 2 1 . 42 X 0 1 . 41 Equal Areas of .0793 1.41 Z
  25. 25. Demonstration Problem 7.1 Suppose that during any hour in a large department store, the average number of shoppers is 448, with a standard deviation of 21 shoppers. What is the probability that a random sample of 49 different shopping hours will yield a sample mean between 441 and 446 shoppers?
  26. 26. Demonstration Problem 7.1
  27. 27. Graphic Solution for Demonstration Problem 7.1 X 1 3 .4901 .4901 .2486 .2415 441 446 448 .2486 .2415 X -2.33 -.67 0 Z
  28. 28. Exercise in R: Normal Distribution The commands you will learn • dnorm • lines • qqnorm • qqline • rnorm • qqnormsim • pnorm • qnorm Open URL: Go to Labs in R and select 3-Distributions
  29. 29. Exercise in R: Sampling Distribution Here you will learn Central Limit Theorem using the sample() command Open URL: Go to Labs in R and select 4A – Intro to inference
  30. 30. Z Formula for Sample Proportions
  31. 31. Demonstration Problem 7.3 If 10% of a population of parts is defective, what is the probability of randomly selecting 80 parts and finding that 12 or more parts are defective?
  32. 32. Solution for Demonstration Problem 7.3
  33. 33. Graphic Solution for Demonstration Problem 7.3  p 1 0 . 0335 .5000 .5000 .4319 ^ 0.15 p 0.10 Z = ˆ p .4319 p 0 . 15 0 0 . 10 p q (. 10 )(. 90 ) n 80 0 . 05 0 . 0335 1.49 Z 1 . 49