SlideShare a Scribd company logo
1 of 39
1
Unit 3: Sampling Distributions,
Parameters and Parameter Estimates
What are the risks of excessive drinking at a department
party?
2
A picture is worth a thousand
words……
3
Inferential Statistics
Inferential statistics are used estimate “parameters” in the
population from parameter estimates in a sample drawn from
that population.
In inferential statistics, we use these parameter estimates to
test hypotheses (predictions; Null and alternative
hypotheses) about the size of the population parameter.
These predictions about the size of populations parameters
typically map directly onto research questions about (causal)
relationships between variables (IVs and DV)
Answers from inferential statistical are probabilistic. In other
words, all answers have the potential to be wrong and you
will provide an index of that probability along with your
results.
4
Populations
A population is any clearly defined set of objects or events
(people, occurrences, animals, etc.). Populations usually
represent all events in a particular class (e.g., all college
students, all alcoholics, all depressed people, all people). It
is often an abstract concept because in many/most
instances you will never have access to the entire
population.
For example, many of our studies may have the population
of all people as its target.
Nonetheless, researchers usually want to describe or draw
conclusions about populations. (e.g., we don’t care if some
new drug is an effective treatment for 100 people in your
sample- Will it work, on average, for everyone we might
treat?)
5
(Population) Parameters
A parameter is a value used to describe a certain
characteristic of a population. It is usually unknown and
therefore has to be estimated.
For example, the population mean is a parameter that is often
used to indicate the average/typical value of a variable in the
population.
Within a population, a parameter is a fixed value which does
not vary within the population at the time of measurement
(e.g., the mean height of people in the US at the present
moment).
You typically cant calculate these parameters directly
because you don’t have access to the entire population.
We use Greek letters to represent population parameters (,
, 2, 0, j)
6
Samples & Parameter Estimates
A sample is a finite group of units (e.g., participants)
selected from the population of interest.
A sample is generally selected for study because the
population is too large to study in its entirety. We typically
have only one sample in a study.
We use the sample to estimate and test parameters in the
population.
These estimates are called parameter estimates.
We use Roman letters to represent sample parameter
estimates (X, s, s2, b0, bj).
7
Sampling Error
Since a sample does not include all members of the
population, parameter estimates generally differ from
parameters on the entire population (e.g., use mean height of
a sample of 1000 people to estimate mean height of US
population).
The difference between the (sample) parameter estimate and
the (population) parameter is sampling error.
You will not be able to calculate the sampling error of your
parameter estimate directly because you don’t know the
value of the population parameter. However, you can
estimate it by probabilistic modeling of the hypothetical
sampling distribution for that parameter.
8
Hypothetical Sampling Distribution
A sampling distribution is a probability distribution of all
possible samples of size N taken from a population
A sampling distribution can be formed for any population
parameter.
Each time you draw a sample of size N from a population,
you can calculate an estimate of that population parameter
from that sample.
Because of sampling error, these parameter estimates will
not exactly equal the population parameter. They will not
equal each other either. They will form a distribution.
A sampling distribution, like a population, is an abstract
concept that represent the outcome of repeated (infinite)
sampling. You will typically only have one sample.
9
What if we didn’t need samples?
Research question: How do inhabitants of a remote pacific
island feel about the ocean?
Population size = 10,000
Dependent measure: Ocean liking scale scores range from -
100 (strongly dislike) to 100 (strongly like); 0 represents
neutral
Hypotheses: H0:  = 0; Ha:  <> 0)
How would you answer this question if you had unlimited
resources (time, money, and patience!)
Administer the Ocean liking scale to all 10,000
inhabitants in the population and calculate the
population mean score. Is it 0? If not, the inhabitants
are not neutral on average.
10
Ocean Liking Scale Scores in Full Population
> setwd("C:/Users/LocalUser/Desktop/GLM")
> d = lm.readDat('3_SamplingDistributions_Like.dat')
> str(d)
'data.frame': 10000 obs. of 2 variables:
$ Like0: num -23.61 -9.01 30.54 5.89 -9.16 ...
> lm.describeData(d)
var n mean sd median min max skew kurtosis
Like0 1 10000 0 23.67 0.1 -86.64 84.46 0 -0.03
11
Ocean Liking Scale Scores in Full Population
> windows() #quartz() for MAC users
> par('cex' = 1.5, 'lwd' = 2, 'font.axis'=1.5, 'font.lab' = 2)
> hist(d$Like0, col=‘yellow’)
12
Parameter Estimation and Testing
What do you conclude?
Inhabitants of island ARE neutral on average on the
Ocean Liking Scale;  = 0
How confident are you about this conclusion?
Excluding issues of measurement of the scale (i.e.,
reliability), you are 100% confident that the population
mean score on this scale is 0 ( = 0).
Of course, this approach to answering a research question is
not typical. Why? And how would you normally answer this
question?
You will very rarely have access to all scores in the
population. Instead, you have to use inferential
statistics to “infer” (estimate) the size of the population
parameter from a sample.
13
Obtain a Sample
You are a poor graduate student. All you can afford is N=10
A sample mean of 2.40 is not 0. However, you know that
the sample mean will not match the population mean
exactly. How likely is it to get a sample mean of 2.40 if
the population mean is 0 (think about it!)
> dS = data.frame(Sample1 = sample(d$Like0,10))
> lm.describeData(dS$Sample1,1)
n mean sd min max
Sample1 10 2.4 24.97 -33.93 46.97
What do you conclude and why?
14
Obtain a Sample
Your friend is a poor graduate student too. All she can afford
is N=10 too.
A sample mean of 1.04 is not 0. However, you know that
the sample mean will not match the population mean
exactly. It is more likely to get a sample mean of 1.04
than 2.40 if the population mean is 0 but you still don’t
know how likely either outcome is. What if she obtained
a sample with mean of 30?
> dS$Sample2 = sample(d$Like0,10)
> lm.describeData(dS$Sample2,1)
n mean sd min max
sample2 10 1.04 22.42 -22.74 44.43
What do you conclude and why?
15
Sampling Distribution of the Mean
1. Imagine drawing many samples (lets say 1000 samples
but in theory, the sampling distribution is infinite) of
N=10 participants (10 participants in each sample) from
your population
2. Next, calculate the mean for each of these samples of 10
participants
3. Finally, create a histogram (or density plot) of these
sample means
You can construct a sampling distribution for any sample
statistic (e.g., mean, s, min, max, r, B0, B1)
For the mean, you can think of the sampling distribution
conceptually as follows:
16
1000 Samples of N=10 OLS Scores
n mean sd min max
sample1 10 2.40 24.97 -33.93 46.97
sample2 10 1.04 22.42 -22.74 44.43
sample3 10 -2.52 25.39 -47.83 37.05
sample4 10 -0.08 22.78 -32.19 34.35
sample5 10 -13.48 21.14 -42.61 15.04
sample6 10 2.77 26.30 -49.92 45.56
sample7 10 -9.56 21.35 -38.03 25.86
sample8 10 -5.32 16.74 -30.69 25.57
sample9 10 5.89 30.28 -55.08 44.32
sample10 10 4.51 30.30 -43.83 56.36
sample11 10 5.65 28.41 -55.83 43.16
sample12 10 8.23 23.62 -37.88 54.17
sample13 10 1.14 23.90 -29.68 48.80
sample14 10 -9.44 27.63 -47.12 32.19
sample15 10 -6.20 24.50 -51.34 33.58
...
sample999 10 -6.33 22.38 -31.03 36.77
sample1000 10 15.22 22.12 -19.57 59.47
Descriptives for each of 1000 samples of N=10
17
Sampling Distribution of the Mean
n mean sd median min max skew kurtosis
mean 1000 0.02 7.48 -0.14 -27.5 22.25 -0.03 0.09
Descriptives for 1000 sample means of N=10
NOTE: In your research, you don’t form a sampling
distribution. You (typically) only have one sample.
18
Raw Score Distribution vs. Sampling Distribution
NOTE: The distinction between raw score distribution vs.
sample distribution is very important to keep clear in your
mind!
19
Sampling Distribution of the Mean
What will the mean of the sample means be? In other
words, what is the mean of the sampling distribution?
The mean of the sample means (i.e., the mean of the
sampling distribution) will equal the population mean of
raw scores on the dependent measure. This is
important b/c it indicates that the sample mean is an
unbiased estimator of the population mean.
20
Sampling Distribution of the Mean
The mean is an unbiased estimator:
The mean of the sample means will equal the mean of the
population. Therefore individual sample means will neither
systematically under or overestimate the population mean.
The sample variance (s2; with n-1 denominator) is also an
unbiased estimator of the population variance (2). In other
words, the mean of the sample s2’s will approximate the
population variance. Sample s is negatively biased
n mean sd median min max skew kurtosis
mean 1000 0.02 7.48 -0.14 -27.5 22.25 -0.03 0.09
n mean sd median min max skew kurtosis
Like0 10000 0 23.67 0.1 -86.64 84.46 0 -0.03
Raw Ocean Liking scores
Sample (N=10) means
21
Sampling Distribution of the Mean
Will all of the sample means be the same?
No, there was a distribution of means that varied from each
other. The mean of the sampling distribution was the
population mean but the standard deviation was not zero
n mean sd median min max skew kurtosis
mean 1000 0.02 7.48 -0.14 -27.5 22.25 -0.03 0.09
22
Standard Error (SE)
The standard deviation of the sampling distribution (i.e.,
standard deviation of the infinite sample means) is equal to:

 Nsample
Where  is the standard deviation of the population of raw
scores
This variability in the sampling distribution is due to
sampling error.
Therefore, b/c we use sample statistics (parameter
estimates) to estimate population parameters, we would like
to minimize sampling error.
The standard deviation of the sampling distribution for a
population parameter has a technical name. It is called the
standard error of the statistic. Here, we are talking about the
standard error of the mean
23
Standard Error
What factors affect the size of the sampling error of the mean
(i.e., the standard error)?

 Nsample
The standard deviation of the population raw scores and the
sample size
24
Factors that Affect the Standard Error (SE)
Variation among raw scores for a variable in the population
is broadly caused by two factors. What are they?
There would be no SE b/c no matter which participants you
sampled, they would all have the same scores.
What would happened to SE if there was no variation in
population scores?
As the variability of the variable increases in the population,
the SE increases.
What is the relationship between population variability ( )
and SE?
(a) Individual differences
(b) Measurement error (the opposite of reliability)
25
Factors that Affect the Standard Error (SE)
What is the relationship between sample size and SE?
If each sample contained only 1 participant, the SE would be
equal to the variation ( ) observed within the population.
What would happen if the samples contained only 1
participant?
If the sample contained ALL participants from the population,
the SE would be equal to 0 because each sample mean
would have exactly the same value as the overall population
mean (b/c all same scores).
What would the SE be if the sample size = population size?
As the sample size increases, the SE for the statistic will
decrease.
26
Shape of the Sampling Distribution
Central Limit Theorem:
The shape of the sampling distribution approaches normal
as N increases.
Roughly normal even for moderate sample sizes assuming
that the original distribution isn’t really weird (i.e., non-
normal).
27
Normal Pop and Various Sampling Distributions
NOTES: Population size = 100,000; Simulated 10,000 samples
28
Uniform Pop and Various Sampling Distributions
29
Skewed Pop and Various Sample Distributions
NOTE: x-axis scale changes across figures on this slide
30
An Important Normal Distribution: Z-scores
Z scores are normally distributed scores with a mean of 0
and a standard deviation of 1.
You can therefore think of a z-score as telling you the
position of the score in terms of standard deviations above
the mean.
The probability distribution is known for z-scores.
16% 16% 2.5% 2.5% 0.5% 0.5%
31
Probability of Parameter estimate given H0
How could you use the z-score distribution to determine
the probability of obtaining a sample mean (parameter
estimate) of 2.40 if you draw a sample of N=10 from a
population of Ocean Liking scores with a population mean
(parameter) of 0?
Think about it……
32
Hypothetical Sampling Distribution for H0
If H0 is true; sampling distribution has a mean of 0 and
standard deviation of  / Nsample = 23.7 / 10 = 7.5
33
Hypothetical Sampling Distribution for H0
If H0 is true and this is the sampling distribution (in blue),
how likely is it to get a sample mean of 2.4 or more extreme?
Pretty likely….. 
But we can do better than that…….
34
Our first inferential test: the z-test
z = 2.4 – 0 = 0.32; p < .749
7.5
pnorm(0.32, mean=0, sd=1, lower.tail=FALSE) * 2
0.7489683
37.4%
37.4%
35
t vs. z
Where did we get the 2.4 from in our z test?
Our sample mean from our study. This is our parameter
estimate of the population mean of OLS (Like0) scores
z = 2.4 – 0 = 0.32
7.5
Where did we get the 0 from in our z test?
This is the mean of the sampling distribution of OLS scores
if H0 is true.
Where did we get the 7.5 from in our z test and what is the
problem with this?
This was our estimate of the standard deviation of the
sampling distribution.  / NSample
We do not know .
36
t vs. z
How can we estimate  ?
We can use our sample standard deviation (s) but s is a
negatively biased parameter estimate. On average, it will
underestimate 
So what do we do?
We account for this underestimation of  and therefore of the
standard deviation (standard error) of the sampling
distribution by using the t distribution rather than the z
distribution to calculate the probability of our parameter
estimate if H0 is true.
The t distribution is slightly wider, particularly for small
sample sizes to correct for our underestimate of the
standard deviation
37
Our second inferential test: t-test
t(df) = Parameter estimate – Parameter: H0
Standard error of parameter estimate
Where SE is estimated use s from sample data
df = N – P = 10 - 1 = 9
38
The bias in s decreases with increasing N. Therefore, t
approaches z with larger sample sizes
t vs. z
39
Null Hypothesis Significance Testing (NHST)
1. Divide reality regarding the size of the population parameter
into two non-overlapping possibilities. (Null hypothesis &
Alternate hypothesis).
2. Assume that the Null hypothesis is true.
3. Collect data.
4. Calculate the probability (p-value) of obtaining your parameter
estimate (or a more extreme estimate) given your assumption
(i.e., the Null hypothesis is true)
5. Compare probability to some cut-off value (alpha level).
6. (a) If this parameter estimate is less probable than cut-off
value, reject null hypothesis in favor of alternate hypothesis.
6. (b) If data is not less probable, fail to reject Null hypothesis.

More Related Content

Similar to Sampling Distributions

Z and t_tests
Z and t_testsZ and t_tests
Z and t_testseducation
 
Basic of Statistical Inference Part-I
Basic of Statistical Inference Part-IBasic of Statistical Inference Part-I
Basic of Statistical Inference Part-IDexlab Analytics
 
Confidence Intervals in the Life Sciences PresentationNamesS.docx
Confidence Intervals in the Life Sciences PresentationNamesS.docxConfidence Intervals in the Life Sciences PresentationNamesS.docx
Confidence Intervals in the Life Sciences PresentationNamesS.docxmaxinesmith73660
 
Estimation in statistics
Estimation in statisticsEstimation in statistics
Estimation in statisticsRabea Jamal
 
HW1_STAT206.pdfStatistical Inference II J. Lee Assignment.docx
HW1_STAT206.pdfStatistical Inference II J. Lee Assignment.docxHW1_STAT206.pdfStatistical Inference II J. Lee Assignment.docx
HW1_STAT206.pdfStatistical Inference II J. Lee Assignment.docxwilcockiris
 
Normal and standard normal distribution
Normal and standard normal distributionNormal and standard normal distribution
Normal and standard normal distributionAvjinder (Avi) Kaler
 
Statistics by DURGESH JHARIYA OF jnv,bn,jbp
Statistics by DURGESH JHARIYA OF jnv,bn,jbpStatistics by DURGESH JHARIYA OF jnv,bn,jbp
Statistics by DURGESH JHARIYA OF jnv,bn,jbpDJJNV
 
Chp11 - Research Methods for Business By Authors Uma Sekaran and Roger Bougie
Chp11  - Research Methods for Business By Authors Uma Sekaran and Roger BougieChp11  - Research Methods for Business By Authors Uma Sekaran and Roger Bougie
Chp11 - Research Methods for Business By Authors Uma Sekaran and Roger BougieHassan Usman
 
Lecture 5 Sampling distribution of sample mean.pptx
Lecture 5 Sampling distribution of sample mean.pptxLecture 5 Sampling distribution of sample mean.pptx
Lecture 5 Sampling distribution of sample mean.pptxshakirRahman10
 
Sampling distribution
Sampling distributionSampling distribution
Sampling distributionswarna dey
 
Lect w2 measures_of_location_and_spread
Lect w2 measures_of_location_and_spreadLect w2 measures_of_location_and_spread
Lect w2 measures_of_location_and_spreadRione Drevale
 
Review of Chapters 1-5.ppt
Review of Chapters 1-5.pptReview of Chapters 1-5.ppt
Review of Chapters 1-5.pptNobelFFarrar
 
Hypothesis testing: A single sample test
Hypothesis testing: A single sample testHypothesis testing: A single sample test
Hypothesis testing: A single sample testUmme Salma Tuli
 
Sampling Distribution and Simulation in R
Sampling Distribution and Simulation in RSampling Distribution and Simulation in R
Sampling Distribution and Simulation in RPremier Publishers
 

Similar to Sampling Distributions (20)

Z and t_tests
Z and t_testsZ and t_tests
Z and t_tests
 
Applied statistics part 1
Applied statistics part 1Applied statistics part 1
Applied statistics part 1
 
Statistics excellent
Statistics excellentStatistics excellent
Statistics excellent
 
Basic of Statistical Inference Part-I
Basic of Statistical Inference Part-IBasic of Statistical Inference Part-I
Basic of Statistical Inference Part-I
 
Confidence Intervals in the Life Sciences PresentationNamesS.docx
Confidence Intervals in the Life Sciences PresentationNamesS.docxConfidence Intervals in the Life Sciences PresentationNamesS.docx
Confidence Intervals in the Life Sciences PresentationNamesS.docx
 
Sample
SampleSample
Sample
 
Estimation in statistics
Estimation in statisticsEstimation in statistics
Estimation in statistics
 
HW1_STAT206.pdfStatistical Inference II J. Lee Assignment.docx
HW1_STAT206.pdfStatistical Inference II J. Lee Assignment.docxHW1_STAT206.pdfStatistical Inference II J. Lee Assignment.docx
HW1_STAT206.pdfStatistical Inference II J. Lee Assignment.docx
 
Normal and standard normal distribution
Normal and standard normal distributionNormal and standard normal distribution
Normal and standard normal distribution
 
Statistics by DURGESH JHARIYA OF jnv,bn,jbp
Statistics by DURGESH JHARIYA OF jnv,bn,jbpStatistics by DURGESH JHARIYA OF jnv,bn,jbp
Statistics by DURGESH JHARIYA OF jnv,bn,jbp
 
Chp11 - Research Methods for Business By Authors Uma Sekaran and Roger Bougie
Chp11  - Research Methods for Business By Authors Uma Sekaran and Roger BougieChp11  - Research Methods for Business By Authors Uma Sekaran and Roger Bougie
Chp11 - Research Methods for Business By Authors Uma Sekaran and Roger Bougie
 
Lecture 5 Sampling distribution of sample mean.pptx
Lecture 5 Sampling distribution of sample mean.pptxLecture 5 Sampling distribution of sample mean.pptx
Lecture 5 Sampling distribution of sample mean.pptx
 
Basic statistics
Basic statisticsBasic statistics
Basic statistics
 
Sampling distribution
Sampling distributionSampling distribution
Sampling distribution
 
Lect w2 measures_of_location_and_spread
Lect w2 measures_of_location_and_spreadLect w2 measures_of_location_and_spread
Lect w2 measures_of_location_and_spread
 
Probability & Samples
Probability & SamplesProbability & Samples
Probability & Samples
 
Unit 3 Sampling
Unit 3 SamplingUnit 3 Sampling
Unit 3 Sampling
 
Review of Chapters 1-5.ppt
Review of Chapters 1-5.pptReview of Chapters 1-5.ppt
Review of Chapters 1-5.ppt
 
Hypothesis testing: A single sample test
Hypothesis testing: A single sample testHypothesis testing: A single sample test
Hypothesis testing: A single sample test
 
Sampling Distribution and Simulation in R
Sampling Distribution and Simulation in RSampling Distribution and Simulation in R
Sampling Distribution and Simulation in R
 

More from RINUSATHYAN

Frameworks provide structure. The core objective of the Big Data Framework is...
Frameworks provide structure. The core objective of the Big Data Framework is...Frameworks provide structure. The core objective of the Big Data Framework is...
Frameworks provide structure. The core objective of the Big Data Framework is...RINUSATHYAN
 
Data analytics for engineers- introduction
Data analytics for engineers-  introductionData analytics for engineers-  introduction
Data analytics for engineers- introductionRINUSATHYAN
 
Neural-Networks.ppt
Neural-Networks.pptNeural-Networks.ppt
Neural-Networks.pptRINUSATHYAN
 
artificial-neural-networks-rev.ppt
artificial-neural-networks-rev.pptartificial-neural-networks-rev.ppt
artificial-neural-networks-rev.pptRINUSATHYAN
 

More from RINUSATHYAN (14)

Frameworks provide structure. The core objective of the Big Data Framework is...
Frameworks provide structure. The core objective of the Big Data Framework is...Frameworks provide structure. The core objective of the Big Data Framework is...
Frameworks provide structure. The core objective of the Big Data Framework is...
 
Data analytics for engineers- introduction
Data analytics for engineers-  introductionData analytics for engineers-  introduction
Data analytics for engineers- introduction
 
apriori.pdf
apriori.pdfapriori.pdf
apriori.pdf
 
Neural-Networks.ppt
Neural-Networks.pptNeural-Networks.ppt
Neural-Networks.ppt
 
artificial-neural-networks-rev.ppt
artificial-neural-networks-rev.pptartificial-neural-networks-rev.ppt
artificial-neural-networks-rev.ppt
 
forecasting
forecastingforecasting
forecasting
 
regression
regressionregression
regression
 
Decision tree
Decision treeDecision tree
Decision tree
 
disc brake
disc brakedisc brake
disc brake
 
CLUTCH
CLUTCHCLUTCH
CLUTCH
 
rpt.pdf
rpt.pdfrpt.pdf
rpt.pdf
 
RPT- ppt.pdf
RPT- ppt.pdfRPT- ppt.pdf
RPT- ppt.pdf
 
FMS.pdf
FMS.pdfFMS.pdf
FMS.pdf
 
ERP ppt.ppt
ERP ppt.pptERP ppt.ppt
ERP ppt.ppt
 

Recently uploaded

chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learningmisbanausheenparvam
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxDeepakSakkari2
 
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxPoojaBan
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)dollysharma2066
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxKartikeyaDwivedi3
 
Introduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxIntroduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxvipinkmenon1
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxk795866
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girlsssuser7cb4ff
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxbritheesh05
 
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEINFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEroselinkalist12
 
power system scada applications and uses
power system scada applications and usespower system scada applications and uses
power system scada applications and usesDevarapalliHaritha
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.eptoze12
 

Recently uploaded (20)

chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learning
 
Design and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdfDesign and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdf
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptx
 
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptx
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptx
 
Introduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxIntroduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptx
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptx
 
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girls
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptx
 
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEINFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
 
power system scada applications and uses
power system scada applications and usespower system scada applications and uses
power system scada applications and uses
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
POWER SYSTEMS-1 Complete notes examples
POWER SYSTEMS-1 Complete notes  examplesPOWER SYSTEMS-1 Complete notes  examples
POWER SYSTEMS-1 Complete notes examples
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.
 

Sampling Distributions

  • 1. 1 Unit 3: Sampling Distributions, Parameters and Parameter Estimates
  • 2. What are the risks of excessive drinking at a department party? 2 A picture is worth a thousand words……
  • 3. 3 Inferential Statistics Inferential statistics are used estimate “parameters” in the population from parameter estimates in a sample drawn from that population. In inferential statistics, we use these parameter estimates to test hypotheses (predictions; Null and alternative hypotheses) about the size of the population parameter. These predictions about the size of populations parameters typically map directly onto research questions about (causal) relationships between variables (IVs and DV) Answers from inferential statistical are probabilistic. In other words, all answers have the potential to be wrong and you will provide an index of that probability along with your results.
  • 4. 4 Populations A population is any clearly defined set of objects or events (people, occurrences, animals, etc.). Populations usually represent all events in a particular class (e.g., all college students, all alcoholics, all depressed people, all people). It is often an abstract concept because in many/most instances you will never have access to the entire population. For example, many of our studies may have the population of all people as its target. Nonetheless, researchers usually want to describe or draw conclusions about populations. (e.g., we don’t care if some new drug is an effective treatment for 100 people in your sample- Will it work, on average, for everyone we might treat?)
  • 5. 5 (Population) Parameters A parameter is a value used to describe a certain characteristic of a population. It is usually unknown and therefore has to be estimated. For example, the population mean is a parameter that is often used to indicate the average/typical value of a variable in the population. Within a population, a parameter is a fixed value which does not vary within the population at the time of measurement (e.g., the mean height of people in the US at the present moment). You typically cant calculate these parameters directly because you don’t have access to the entire population. We use Greek letters to represent population parameters (, , 2, 0, j)
  • 6. 6 Samples & Parameter Estimates A sample is a finite group of units (e.g., participants) selected from the population of interest. A sample is generally selected for study because the population is too large to study in its entirety. We typically have only one sample in a study. We use the sample to estimate and test parameters in the population. These estimates are called parameter estimates. We use Roman letters to represent sample parameter estimates (X, s, s2, b0, bj).
  • 7. 7 Sampling Error Since a sample does not include all members of the population, parameter estimates generally differ from parameters on the entire population (e.g., use mean height of a sample of 1000 people to estimate mean height of US population). The difference between the (sample) parameter estimate and the (population) parameter is sampling error. You will not be able to calculate the sampling error of your parameter estimate directly because you don’t know the value of the population parameter. However, you can estimate it by probabilistic modeling of the hypothetical sampling distribution for that parameter.
  • 8. 8 Hypothetical Sampling Distribution A sampling distribution is a probability distribution of all possible samples of size N taken from a population A sampling distribution can be formed for any population parameter. Each time you draw a sample of size N from a population, you can calculate an estimate of that population parameter from that sample. Because of sampling error, these parameter estimates will not exactly equal the population parameter. They will not equal each other either. They will form a distribution. A sampling distribution, like a population, is an abstract concept that represent the outcome of repeated (infinite) sampling. You will typically only have one sample.
  • 9. 9 What if we didn’t need samples? Research question: How do inhabitants of a remote pacific island feel about the ocean? Population size = 10,000 Dependent measure: Ocean liking scale scores range from - 100 (strongly dislike) to 100 (strongly like); 0 represents neutral Hypotheses: H0:  = 0; Ha:  <> 0) How would you answer this question if you had unlimited resources (time, money, and patience!) Administer the Ocean liking scale to all 10,000 inhabitants in the population and calculate the population mean score. Is it 0? If not, the inhabitants are not neutral on average.
  • 10. 10 Ocean Liking Scale Scores in Full Population > setwd("C:/Users/LocalUser/Desktop/GLM") > d = lm.readDat('3_SamplingDistributions_Like.dat') > str(d) 'data.frame': 10000 obs. of 2 variables: $ Like0: num -23.61 -9.01 30.54 5.89 -9.16 ... > lm.describeData(d) var n mean sd median min max skew kurtosis Like0 1 10000 0 23.67 0.1 -86.64 84.46 0 -0.03
  • 11. 11 Ocean Liking Scale Scores in Full Population > windows() #quartz() for MAC users > par('cex' = 1.5, 'lwd' = 2, 'font.axis'=1.5, 'font.lab' = 2) > hist(d$Like0, col=‘yellow’)
  • 12. 12 Parameter Estimation and Testing What do you conclude? Inhabitants of island ARE neutral on average on the Ocean Liking Scale;  = 0 How confident are you about this conclusion? Excluding issues of measurement of the scale (i.e., reliability), you are 100% confident that the population mean score on this scale is 0 ( = 0). Of course, this approach to answering a research question is not typical. Why? And how would you normally answer this question? You will very rarely have access to all scores in the population. Instead, you have to use inferential statistics to “infer” (estimate) the size of the population parameter from a sample.
  • 13. 13 Obtain a Sample You are a poor graduate student. All you can afford is N=10 A sample mean of 2.40 is not 0. However, you know that the sample mean will not match the population mean exactly. How likely is it to get a sample mean of 2.40 if the population mean is 0 (think about it!) > dS = data.frame(Sample1 = sample(d$Like0,10)) > lm.describeData(dS$Sample1,1) n mean sd min max Sample1 10 2.4 24.97 -33.93 46.97 What do you conclude and why?
  • 14. 14 Obtain a Sample Your friend is a poor graduate student too. All she can afford is N=10 too. A sample mean of 1.04 is not 0. However, you know that the sample mean will not match the population mean exactly. It is more likely to get a sample mean of 1.04 than 2.40 if the population mean is 0 but you still don’t know how likely either outcome is. What if she obtained a sample with mean of 30? > dS$Sample2 = sample(d$Like0,10) > lm.describeData(dS$Sample2,1) n mean sd min max sample2 10 1.04 22.42 -22.74 44.43 What do you conclude and why?
  • 15. 15 Sampling Distribution of the Mean 1. Imagine drawing many samples (lets say 1000 samples but in theory, the sampling distribution is infinite) of N=10 participants (10 participants in each sample) from your population 2. Next, calculate the mean for each of these samples of 10 participants 3. Finally, create a histogram (or density plot) of these sample means You can construct a sampling distribution for any sample statistic (e.g., mean, s, min, max, r, B0, B1) For the mean, you can think of the sampling distribution conceptually as follows:
  • 16. 16 1000 Samples of N=10 OLS Scores n mean sd min max sample1 10 2.40 24.97 -33.93 46.97 sample2 10 1.04 22.42 -22.74 44.43 sample3 10 -2.52 25.39 -47.83 37.05 sample4 10 -0.08 22.78 -32.19 34.35 sample5 10 -13.48 21.14 -42.61 15.04 sample6 10 2.77 26.30 -49.92 45.56 sample7 10 -9.56 21.35 -38.03 25.86 sample8 10 -5.32 16.74 -30.69 25.57 sample9 10 5.89 30.28 -55.08 44.32 sample10 10 4.51 30.30 -43.83 56.36 sample11 10 5.65 28.41 -55.83 43.16 sample12 10 8.23 23.62 -37.88 54.17 sample13 10 1.14 23.90 -29.68 48.80 sample14 10 -9.44 27.63 -47.12 32.19 sample15 10 -6.20 24.50 -51.34 33.58 ... sample999 10 -6.33 22.38 -31.03 36.77 sample1000 10 15.22 22.12 -19.57 59.47 Descriptives for each of 1000 samples of N=10
  • 17. 17 Sampling Distribution of the Mean n mean sd median min max skew kurtosis mean 1000 0.02 7.48 -0.14 -27.5 22.25 -0.03 0.09 Descriptives for 1000 sample means of N=10 NOTE: In your research, you don’t form a sampling distribution. You (typically) only have one sample.
  • 18. 18 Raw Score Distribution vs. Sampling Distribution NOTE: The distinction between raw score distribution vs. sample distribution is very important to keep clear in your mind!
  • 19. 19 Sampling Distribution of the Mean What will the mean of the sample means be? In other words, what is the mean of the sampling distribution? The mean of the sample means (i.e., the mean of the sampling distribution) will equal the population mean of raw scores on the dependent measure. This is important b/c it indicates that the sample mean is an unbiased estimator of the population mean.
  • 20. 20 Sampling Distribution of the Mean The mean is an unbiased estimator: The mean of the sample means will equal the mean of the population. Therefore individual sample means will neither systematically under or overestimate the population mean. The sample variance (s2; with n-1 denominator) is also an unbiased estimator of the population variance (2). In other words, the mean of the sample s2’s will approximate the population variance. Sample s is negatively biased n mean sd median min max skew kurtosis mean 1000 0.02 7.48 -0.14 -27.5 22.25 -0.03 0.09 n mean sd median min max skew kurtosis Like0 10000 0 23.67 0.1 -86.64 84.46 0 -0.03 Raw Ocean Liking scores Sample (N=10) means
  • 21. 21 Sampling Distribution of the Mean Will all of the sample means be the same? No, there was a distribution of means that varied from each other. The mean of the sampling distribution was the population mean but the standard deviation was not zero n mean sd median min max skew kurtosis mean 1000 0.02 7.48 -0.14 -27.5 22.25 -0.03 0.09
  • 22. 22 Standard Error (SE) The standard deviation of the sampling distribution (i.e., standard deviation of the infinite sample means) is equal to:   Nsample Where  is the standard deviation of the population of raw scores This variability in the sampling distribution is due to sampling error. Therefore, b/c we use sample statistics (parameter estimates) to estimate population parameters, we would like to minimize sampling error. The standard deviation of the sampling distribution for a population parameter has a technical name. It is called the standard error of the statistic. Here, we are talking about the standard error of the mean
  • 23. 23 Standard Error What factors affect the size of the sampling error of the mean (i.e., the standard error)?   Nsample The standard deviation of the population raw scores and the sample size
  • 24. 24 Factors that Affect the Standard Error (SE) Variation among raw scores for a variable in the population is broadly caused by two factors. What are they? There would be no SE b/c no matter which participants you sampled, they would all have the same scores. What would happened to SE if there was no variation in population scores? As the variability of the variable increases in the population, the SE increases. What is the relationship between population variability ( ) and SE? (a) Individual differences (b) Measurement error (the opposite of reliability)
  • 25. 25 Factors that Affect the Standard Error (SE) What is the relationship between sample size and SE? If each sample contained only 1 participant, the SE would be equal to the variation ( ) observed within the population. What would happen if the samples contained only 1 participant? If the sample contained ALL participants from the population, the SE would be equal to 0 because each sample mean would have exactly the same value as the overall population mean (b/c all same scores). What would the SE be if the sample size = population size? As the sample size increases, the SE for the statistic will decrease.
  • 26. 26 Shape of the Sampling Distribution Central Limit Theorem: The shape of the sampling distribution approaches normal as N increases. Roughly normal even for moderate sample sizes assuming that the original distribution isn’t really weird (i.e., non- normal).
  • 27. 27 Normal Pop and Various Sampling Distributions NOTES: Population size = 100,000; Simulated 10,000 samples
  • 28. 28 Uniform Pop and Various Sampling Distributions
  • 29. 29 Skewed Pop and Various Sample Distributions NOTE: x-axis scale changes across figures on this slide
  • 30. 30 An Important Normal Distribution: Z-scores Z scores are normally distributed scores with a mean of 0 and a standard deviation of 1. You can therefore think of a z-score as telling you the position of the score in terms of standard deviations above the mean. The probability distribution is known for z-scores. 16% 16% 2.5% 2.5% 0.5% 0.5%
  • 31. 31 Probability of Parameter estimate given H0 How could you use the z-score distribution to determine the probability of obtaining a sample mean (parameter estimate) of 2.40 if you draw a sample of N=10 from a population of Ocean Liking scores with a population mean (parameter) of 0? Think about it……
  • 32. 32 Hypothetical Sampling Distribution for H0 If H0 is true; sampling distribution has a mean of 0 and standard deviation of  / Nsample = 23.7 / 10 = 7.5
  • 33. 33 Hypothetical Sampling Distribution for H0 If H0 is true and this is the sampling distribution (in blue), how likely is it to get a sample mean of 2.4 or more extreme? Pretty likely…..  But we can do better than that…….
  • 34. 34 Our first inferential test: the z-test z = 2.4 – 0 = 0.32; p < .749 7.5 pnorm(0.32, mean=0, sd=1, lower.tail=FALSE) * 2 0.7489683 37.4% 37.4%
  • 35. 35 t vs. z Where did we get the 2.4 from in our z test? Our sample mean from our study. This is our parameter estimate of the population mean of OLS (Like0) scores z = 2.4 – 0 = 0.32 7.5 Where did we get the 0 from in our z test? This is the mean of the sampling distribution of OLS scores if H0 is true. Where did we get the 7.5 from in our z test and what is the problem with this? This was our estimate of the standard deviation of the sampling distribution.  / NSample We do not know .
  • 36. 36 t vs. z How can we estimate  ? We can use our sample standard deviation (s) but s is a negatively biased parameter estimate. On average, it will underestimate  So what do we do? We account for this underestimation of  and therefore of the standard deviation (standard error) of the sampling distribution by using the t distribution rather than the z distribution to calculate the probability of our parameter estimate if H0 is true. The t distribution is slightly wider, particularly for small sample sizes to correct for our underestimate of the standard deviation
  • 37. 37 Our second inferential test: t-test t(df) = Parameter estimate – Parameter: H0 Standard error of parameter estimate Where SE is estimated use s from sample data df = N – P = 10 - 1 = 9
  • 38. 38 The bias in s decreases with increasing N. Therefore, t approaches z with larger sample sizes t vs. z
  • 39. 39 Null Hypothesis Significance Testing (NHST) 1. Divide reality regarding the size of the population parameter into two non-overlapping possibilities. (Null hypothesis & Alternate hypothesis). 2. Assume that the Null hypothesis is true. 3. Collect data. 4. Calculate the probability (p-value) of obtaining your parameter estimate (or a more extreme estimate) given your assumption (i.e., the Null hypothesis is true) 5. Compare probability to some cut-off value (alpha level). 6. (a) If this parameter estimate is less probable than cut-off value, reject null hypothesis in favor of alternate hypothesis. 6. (b) If data is not less probable, fail to reject Null hypothesis.