Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Session 11 12


Published on

  • Be the first to comment

  • Be the first to like this

Session 11 12

  1. 1. 8/7/2012 Sampling Parameter & Statistic Sampling Distribution • A characteristic of the • A characteristic of the population which is of sample (to estimate and interest in the study the parameter) • Fixed or Non-random Estimation • Random (because the sample is random) • Unknown (because (cont.) typically you don’t have information about all units • Computable or known once you draw Session XI of the population) the sample Estimator/Estimate and Different types of Sampling Random & nonrandom sampling its Bias, Standard Error and Simple random sampling: SRSWR & SRSWOR Sampling Distribution Systematic sampling• Value of the Estimator (statistic) for a given Cluster sampling. sample is your estimate• Bias = Mean (expected value) of the Estimator Stratified sampling minus the parameter Multi-stage sampling - Multi-phase sampling• Standard Error = Standard deviation of the Sequential sampling Estimator• Sampling Distribution is the probability Quota sampling distribution of the Estimator Panel samples 3 Convenient sampling Unbiasedness and Standard error Simple Random Sampling of Sample Mean/Proportion (SRS)• Each unit in the population has equal chance of being included in the sample (even position-wise in the sample) E( X ) = µ E ( p) = π• SRS with replacement (SRSWR): unit already selected are returned before drawing subsequent ones. (Same unit may σ π (1 − π ) appear more than once). Not too realistic but most useful S .E.( X ) = S .E.( p ) = for theoretical treatment n n• SRS without replacement (SRSWOR): – same unit may not be included more than once – selections are not independent Estimated standard errors: – if the population size is very large compared to sample size, SRSWOR can be considered/approximated by SRSWR S p (1 − p ) S .E.( X ) = S .E.( p ) = n n 6 1
  2. 2. 8/7/2012 Finite Population Multiplier (FPM)/ Correction (FPC) S.E. of Sample Mean / proportion with SRSWOR Systematic sampling • Suppose 50 units are to be chosen from a population of N −nσ π (1 − π ) N −n 1000 units. σX = × σp = n × N −1 • Number the units from 1,…, 1000 n N −1 • Select one unit from 1,…,20 by SRS, say you get 6. ? • Then your sample consists of units having the numbers 6, 26, 46, 66, 86,106, 126….966, 986 FPM : Typically ignored if n/N < 5% • Each population unit still has equal chance of being selected; however, each sample (combination is not N −n n −1 equally likely) = 1− ≈ 1− f N −1 N −1 n where f = is the sampling fraction N 7 Cluster Sampling Stratified Sampling • Split the population into several groups (called • Just the opposite of cluster sampling. Now the CLUSTERs), so that units within each cluster are population is split into groups (called STRATA) as heterogeneous as possible, but each cluster in so that units within each stratum are as terms of characteristic is very similar to each other homogeneous as possible • Select one (or occasionally more) cluster(s) by • Select few units from each stratum using SRS SRS • How many to take from each stratum? • Include all units of the selected cluster(s) in your – Depends on your criterion as well as available sample information Stratified sampling: stratified mean Proportional Stratified Sampling nh ∝ Wh or, nh = n Wh Strata 1 2 H • Not always feasible N = ∑ NhStrata size N1 N2 NH • Not always desirable!Sample size n1 nH • Stratified mean and the ‘usual’ mean are the n2 sameStrata mean X1 X2 XH H H 2 Nh σh Stratified mean = ∑ W h X h , where W h = Variance( X stratified ) = ∑ W h2 h =1 N h =1 nh 2
  3. 3. 8/7/2012 Determination of sample size in Best choice of sample size when stratified sampling with budget strata variation is known/estimable constraint ∑ c j n j ≤ B nh ∝ N h σ h N jσ j Wh σ h nj ∝ nh = n cj ∑Wi σ i Examples of Parameters Criterion for ‘good’ Estimators of interestµ = average monthly budget on entertainment • Unbiased Estimatorπ=proportion interested in buying the new model of pianoUnderstand the estimation problem in the context of stratified sampling • Minimum Variance Unbiased Estimator H H H Nh µ = ∑ Wh µ h , where Wh = . So µ = ∑ Wh µ h = ∑ Wh X h ˆ ˆ • Consistent Estimator h =1 N h =1 h =1 H H H π = ∑ Wh π h . So π = ∑ Wh π h = ∑ Wh ph ˆ ˆ h =1 h =1 h =1 Central Limit Theorem Notes about CLT • The real strength of the CLT lies with the fact that the approximation is valid for sampling from ANY population. If a large number (typically n≥30) of units • For certain populations, the approximation will be good are drawn by SRSWR from a population even for smaller sample sizes. Typically, of course, the (with any probability distribution), then the exact sampling distribution of X depends on the population probability distribution. If the population is sampling(probability) distribution of the normal, then X has a normal distribution for any sample sample mean can be approximated by a size n. σ2 normal distribution, i.e. σ 2 • It is easy to see that E ( X ) = µ and Var ( X ) = n X → N (µ , ) You do not need CLT for that. n 17 18 3
  4. 4. 8/7/2012 Confidence Interval of µ Problem (σ known) 0.95 = P[−1.96 < Z < 1.96] Chief of Police Kathy Ackert has recently instituted a crack- X −µ -down on drug dealers in her city. Since the crackdown began, = P[−1.96 < < 1.96] σ 750 of the 12,368 drug dealers in the city have been caught. n The mean dollar value of drugs found on these 750 dealers is σ σ $250,000. The standard deviation of the dollar value of drugs = P[−1.96 < X − µ < 1.96 ] n n for these 750 dealers is $41,000. Construct for Chief Ackert a σ σ 90 percent confidence interval for the mean dollar value of = P[ X − 1.96 < µ < X + 1.96 ] n n drugs possessed by the city’s drug dealers. So, 100(1-α)% C.I. for µ is : σ Standard X ± Zα × error 2 n pt. estimate 19 20 table-value Solution SolutionWant 90% C.I. for µ based on Want 90% C.I. for µ based on X = 250 K , n = 750 , S = 41 K X = 250 K , n = 750, N = 12368, S = 41K So the C . I . is So the C.I . is 41 250 ± 1 . 645 41 12368 − 750 750 250 ± 1.645 × 750 12367 Question: Is it o.k. to replace σ by S? = ( 247.62, 252.38)Answer: yes, when the sample size n is large.(because S is aconsistent estimator of σ. 21 22 Strictly speaking, we should be using FPM here! Correct interpretation of Interesting observations about C.I. P[247.62 < µ < 252.38] = 0.90 confidence level • Interpretation of the confidence coefficient/level – how should we interpret the probability statement? (confidence coefficient) • Link between σ – confidence coefficient/level L = 2 zα – accuracy (length of the C.I.) 2 n – sample size σ µ H = zα 2 n 23 24 4
  5. 5. 8/7/2012 If sample size is small? Practice problem • C.I. is valid only if the sampling is done from a (approximately) Normal populationTwelve bank tellers were randomly sampled and it wasdetermined they made an average of 3.6 errors per day with a • σ known? No further changestandard deviation of 0.42 error. Construct a 90 percent • σ unknown? Use S as an estimate for σ, confidence interval for the population mean of errors per day. and use t-distribution with n-1 degrees ofDo you require to make any assumption about the number of freedom (d.f.)errors bank tellers make? X −µ X −µ ֏ T n −1 ֏ N ( 0 ,1) S 25 σ n 26 n 5