This document discusses sampling distributions and estimation. It covers key concepts such as parameters versus statistics, sampling with and without replacement, unbiasedness, standard error, and the central limit theorem. It also discusses different types of sampling methods like simple random sampling, systematic sampling, stratified sampling, and cluster sampling. Confidence intervals for means and proportions are presented for simple random sampling when the population standard deviation is known and unknown.
bioinformatics using statistical learning, machine learning and deep learning.
Day 2 and 3 materials from 12 days course, focusing on statistical analysis.
Meta analysis for medical data handling is include.
Basic Concepts of Non-Parametric Methods ( Statistics )Hasnat Israq
This gives the basic description of Non-Parametric Methods . This is one of the important topic in Statistics and also for Mathematics and for Researchers-Scientists .
This instructional material aims to provide a comprehensive guide on understanding research respondents for teachers and students seeking reliable sources on this topic.
FellowBuddy.com is an innovative platform that brings students together to share notes, exam papers, study guides, project reports and presentation for upcoming exams.
We connect Students who have an understanding of course material with Students who need help.
Benefits:-
# Students can catch up on notes they missed because of an absence.
# Underachievers can find peer developed notes that break down lecture and study material in a way that they can understand
# Students can earn better grades, save time and study effectively
Our Vision & Mission – Simplifying Students Life
Our Belief – “The great breakthrough in your life comes when you realize it, that you can learn anything you need to learn; to accomplish any goal that you have set for yourself. This means there are no limits on what you can be, have or do.”
Like Us - https://www.facebook.com/FellowBuddycom
Random Artificial Incorporation of Noise in a Learning Classifier System Envi...Daniele Loiacono
Ryan J. Urbanowicz, Nicholas A. Sinnott-Armstrong, Jason H. Moore. "Random Artificial Incorporation of Noise in a Learning Classifier System Environment", IWLCS, 2011
bioinformatics using statistical learning, machine learning and deep learning.
Day 2 and 3 materials from 12 days course, focusing on statistical analysis.
Meta analysis for medical data handling is include.
Basic Concepts of Non-Parametric Methods ( Statistics )Hasnat Israq
This gives the basic description of Non-Parametric Methods . This is one of the important topic in Statistics and also for Mathematics and for Researchers-Scientists .
This instructional material aims to provide a comprehensive guide on understanding research respondents for teachers and students seeking reliable sources on this topic.
FellowBuddy.com is an innovative platform that brings students together to share notes, exam papers, study guides, project reports and presentation for upcoming exams.
We connect Students who have an understanding of course material with Students who need help.
Benefits:-
# Students can catch up on notes they missed because of an absence.
# Underachievers can find peer developed notes that break down lecture and study material in a way that they can understand
# Students can earn better grades, save time and study effectively
Our Vision & Mission – Simplifying Students Life
Our Belief – “The great breakthrough in your life comes when you realize it, that you can learn anything you need to learn; to accomplish any goal that you have set for yourself. This means there are no limits on what you can be, have or do.”
Like Us - https://www.facebook.com/FellowBuddycom
Random Artificial Incorporation of Noise in a Learning Classifier System Envi...Daniele Loiacono
Ryan J. Urbanowicz, Nicholas A. Sinnott-Armstrong, Jason H. Moore. "Random Artificial Incorporation of Noise in a Learning Classifier System Environment", IWLCS, 2011
1. 8/7/2012
Sampling Parameter & Statistic
Sampling Distribution • A characteristic of the • A characteristic of the
population which is of sample (to estimate
and interest in the study
the parameter)
• Fixed or Non-random
Estimation • Random (because the
sample is random)
• Unknown (because
(cont.) typically you don’t have
information about all units
• Computable or
known once you draw
Session XI of the population) the sample
Estimator/Estimate and Different types of Sampling
Random & nonrandom sampling
its Bias, Standard Error and Simple random sampling: SRSWR & SRSWOR
Sampling Distribution Systematic sampling
• Value of the Estimator (statistic) for a given Cluster sampling.
sample is your estimate
• Bias = Mean (expected value) of the Estimator Stratified sampling
minus the parameter Multi-stage sampling - Multi-phase sampling
• Standard Error = Standard deviation of the
Sequential sampling
Estimator
• Sampling Distribution is the probability Quota sampling
distribution of the Estimator Panel samples
3 Convenient sampling
Unbiasedness and Standard error
Simple Random Sampling
of Sample Mean/Proportion
(SRS)
• Each unit in the population has equal chance of being
included in the sample (even position-wise in the sample)
E( X ) = µ E ( p) = π
• SRS with replacement (SRSWR): unit already selected are
returned before drawing subsequent ones. (Same unit may σ π (1 − π )
appear more than once). Not too realistic but most useful S .E.( X ) = S .E.( p ) =
for theoretical treatment
n n
• SRS without replacement (SRSWOR):
– same unit may not be included more than once
– selections are not independent
Estimated standard errors:
– if the population size is very large compared to sample size,
SRSWOR can be considered/approximated by SRSWR
S p (1 − p )
S .E.( X ) = S .E.( p ) =
n n 6
1
2. 8/7/2012
Finite Population Multiplier (FPM)/
Correction (FPC)
S.E. of Sample Mean / proportion with SRSWOR Systematic sampling
• Suppose 50 units are to be chosen from a population of
N −nσ π (1 − π ) N −n 1000 units.
σX = × σp =
n
×
N −1 • Number the units from 1,…, 1000
n N −1 • Select one unit from 1,…,20 by SRS, say you get 6.
?
• Then your sample consists of units having the numbers 6,
26, 46, 66, 86,106, 126….966, 986
FPM : Typically ignored if n/N < 5% • Each population unit still has equal chance of being
selected; however, each sample (combination is not
N −n n −1 equally likely)
= 1− ≈ 1− f
N −1 N −1
n
where f = is the sampling fraction
N 7
Cluster Sampling Stratified Sampling
• Split the population into several groups (called • Just the opposite of cluster sampling. Now the
CLUSTERs), so that units within each cluster are population is split into groups (called STRATA)
as heterogeneous as possible, but each cluster in so that units within each stratum are as
terms of characteristic is very similar to each other homogeneous as possible
• Select one (or occasionally more) cluster(s) by • Select few units from each stratum using SRS
SRS • How many to take from each stratum?
• Include all units of the selected cluster(s) in your – Depends on your criterion as well as available
sample information
Stratified sampling: stratified mean Proportional Stratified Sampling
nh ∝ Wh or, nh = n Wh
Strata 1 2 H
• Not always feasible
N = ∑ Nh
Strata size N1 N2 NH • Not always desirable!
Sample size n1 nH • Stratified mean and the ‘usual’ mean are the
n2
same
Strata mean X1 X2 XH
H H 2
Nh σh
Stratified mean = ∑ W h X h , where W h = Variance( X stratified ) = ∑ W h2
h =1 N h =1 nh
2
3. 8/7/2012
Determination of sample size in
Best choice of sample size when
stratified sampling with budget
strata variation is known/estimable
constraint ∑ c j n j ≤ B
nh ∝ N h σ h
N jσ j
Wh σ h nj ∝
nh = n cj
∑Wi σ i
Examples of Parameters
Criterion for ‘good’ Estimators
of interest
µ = average monthly budget on entertainment • Unbiased Estimator
π=proportion interested in buying the new model of piano
Understand the estimation problem in the context of stratified sampling • Minimum Variance Unbiased Estimator
H H H
Nh
µ = ∑ Wh µ h , where Wh = . So µ = ∑ Wh µ h = ∑ Wh X h
ˆ ˆ • Consistent Estimator
h =1 N h =1 h =1
H H H
π = ∑ Wh π h . So π = ∑ Wh π h = ∑ Wh ph
ˆ ˆ
h =1 h =1 h =1
Central Limit Theorem
Notes about CLT
http://www.statisticalengineering.com/central_limit_theorem_(triangle).htm
http://www.statisticalengineering.com/central_limit_theorem_(triangle).htm
• The real strength of the CLT lies with the fact that the
approximation is valid for sampling from ANY population.
If a large number (typically n≥30) of units • For certain populations, the approximation will be good
are drawn by SRSWR from a population even for smaller sample sizes. Typically, of course, the
(with any probability distribution), then the exact sampling distribution of X depends on the
population probability distribution. If the population is
sampling(probability) distribution of the normal, then X has a normal distribution for any sample
sample mean can be approximated by a size n.
σ2
normal distribution, i.e. σ 2 • It is easy to see that E ( X ) = µ and Var ( X ) =
n
X → N (µ , ) You do not need CLT for that.
n 17 18
3
4. 8/7/2012
Confidence Interval of µ
Problem
(σ known)
0.95 = P[−1.96 < Z < 1.96] Chief of Police Kathy Ackert has recently instituted a crack-
X −µ -down on drug dealers in her city. Since the crackdown began,
= P[−1.96 < < 1.96]
σ 750 of the 12,368 drug dealers in the city have been caught.
n
The mean dollar value of drugs found on these 750 dealers is
σ σ $250,000. The standard deviation of the dollar value of drugs
= P[−1.96 < X − µ < 1.96 ]
n n for these 750 dealers is $41,000. Construct for Chief Ackert a
σ σ 90 percent confidence interval for the mean dollar value of
= P[ X − 1.96 < µ < X + 1.96 ]
n n drugs possessed by the city’s drug dealers.
So, 100(1-α)% C.I. for µ is : σ Standard
X ± Zα × error
2 n
pt. estimate 19 20
table-value
Solution Solution
Want 90% C.I. for µ based on Want 90% C.I. for µ based on
X = 250 K , n = 750 , S = 41 K
X = 250 K , n = 750, N = 12368, S = 41K
So the C . I . is
So the C.I . is
41
250 ± 1 . 645 41 12368 − 750
750 250 ± 1.645 ×
750 12367
Question: Is it o.k. to replace σ by S?
= ( 247.62, 252.38)
Answer: yes, when the sample size n is large.(because S is a
consistent estimator of σ.
21 22
Strictly speaking, we should be using FPM here!
Correct interpretation of
Interesting observations about C.I.
P[247.62 < µ < 252.38] = 0.90
confidence level
• Interpretation of the confidence
coefficient/level
– how should we interpret the probability
statement? (confidence coefficient)
• Link between σ
– confidence coefficient/level L = 2 zα
– accuracy (length of the C.I.) 2 n
– sample size
σ µ
H = zα
2 n 23 24
4
5. 8/7/2012
If sample size is small?
Practice problem • C.I. is valid only if the sampling is done
from a (approximately) Normal population
Twelve bank tellers were randomly sampled and it was
determined they made an average of 3.6 errors per day with a • σ known? No further change
standard deviation of 0.42 error. Construct a 90 percent • σ unknown? Use S as an estimate for σ,
confidence interval for the population mean of errors per day. and use t-distribution with n-1 degrees of
Do you require to make any assumption about the number of freedom (d.f.)
errors bank tellers make?
X −µ
X −µ ֏ T n −1
֏ N ( 0 ,1) S
25
σ n 26
n
5