2. Chapter 3: Producing Data
Introduction
3.1 Design of Experiments
3.2 Sampling Design
3.3 Toward Statistical Inference
3.4 Ethics
2
3. 3.3 Toward Statistical Inference
3
Parameters and Statistics
Sampling Variability
Sampling Distribution
Bias and Variability
Sampling from Large Populations
4. 4
Parameters and Statistics
Using samples to talk about populations
A parameter is a number that describes some characteristic of the population.
In statistical practice, the value of a parameter is not known because we cannot
examine the entire population.
Name Symbol Example
Mean µ In a nationwide test, what is the average score?
Proportion p What proportion of people choose chocolate as their favorite ice cream
flavor?
Name Symbol Example
Sample Mean Sample mean of 100 test scores
Sample
Proportion
Sample proportion of 100 people who choose chocolate
as their favorite ice cream flavor?
x
We answer such questions by studying a sample….
A statistic is a number that describes some characteristic of a sample. The
value of a statistic can be computed directly from the sample data.
p
5. 5
Parameters and Statistics
Examples:
Proportion of all students who attended the last home football game.
Parameter, p
Proportion of registered voters who voted in November.
Parameter, p
Mean height of a sample of NBA basketball players.
Statistics,
Mean SAT of entering freshmen
Parameter, µ
Proportion of people who prefer Coke over Pepsi in a sample of mall
shoppers
Statistics,
Mean number of pepperoni slices on a 12̎ pizza from a sample of a
certain brand of pepperoni pizzas.
Statistics, x
x
6. 6
Statistical Estimation
The process of statistical inference involves using information
from a sample to draw conclusions about a wider population.
Your estimate of the population is only as good as your sampling
design.
Work hard to eliminate biases.
Your sample is only an estimate—and if you randomly sampled
again you would probably get a somewhat different result.
Bigger sample is better.
7. 7
Sampling Variability
Each time we take a random sample from a population, we are
likely to get a different set of individuals and calculate a different
statistic. This is called sampling variability.
We ask, “What would happen if we took many samples?”
Take a large number of samples from the same population.
Calculate the sample mean/proportion for each sample.
Make a histogram of these values.
Examine the distribution displayed in the histogram for shape,
center, and spread, as well as outliers or other deviations.
8. 8
Sampling Variability (Cont…)
The sampling distribution of a statistic is the distribution of that
statistic for samples of a given size n taken from the same
population.
The variability of a statistic is described by the spread of its
sampling distribution. This spread depends on the sampling design
and the sample size n, with larger sample sizes leading to lower
variability.
9. 9
The results of many SRSs have a regular pattern. Here, we draw 1000 SRSs
of size 100 from the same population. The population proportion is p = 0.60.
The histogram shows the distribution of the 1000 sample proportions.
The distribution of sample proportions for 1000 SRSs of size 2500 drawn
from the same population as in first figure. The two histograms have the same
scale. The statistic from the larger sample is less variable.
10. 10
Both bias and variability describe what happens when we take many
shots at the target.
Bias concerns the center of the sampling
distribution. A statistic used to estimate a
parameter is unbiased if the mean of its
sampling distribution is equal to the true
value of the parameter being estimated.
The variability of a statistic is described
by the spread of its sampling distribution.
This spread is determined by the sampling
design and the sample size n. Statistics
from larger probability samples have
smaller spreads.10
Bias and Variability
11. 11
A good sampling scheme must have both small bias and small variability.
To reduce bias, use random sampling.
To reduce variability of a statistic from an SRS, use a larger sample.
Managing Bias and Variability
POPULATION SIZE DOESN’T MATTER
The variability of a statistic from a random sample does not depend
on the size of the population, as long as the population is at least
100 times larger than the sample.
13. 13
Institutional Review Boards
The organization that carries out the study must have an
institutional review board that reviews all planned studies in
advance in order to protect the subjects from possible harm.
The institutional review board:
reviews the plan of study
can require changes
reviews the consent form
monitors progress at least once a year
14. 14
Informed Consent
All subjects must give their informed consent before data are
collected.
Subjects must be informed in advance about the nature of a study
and any risk of harm it might bring.
Subjects must then consent in writing.
Who can’t give informed consent?
prison inmates
very young children
people with mental disorders
15. 15
Confidentiality
All individual data must be kept confidential. Only statistical
summaries may be made public.
Confidentiality is not the same as anonymity. Anonymity means
that subjects are anonymous—their names are not known even to
the director of the study. Anonymity prevents follow-ups to
improve non-response or inform subjects of results.
Any breach of confidentiality is a serious violation of data ethics.
The best practice is to separate the identity of the subjects from
the rest of the data immediately!
16. 16
Clinical Trials
Clinical trials study the effectiveness of medical treatments on actual
patients—these treatments can harm as well as heal.
Points for a discussion:
Randomized comparative experiments are the only way to
see the true effects of new treatments.
Most benefits of clinical trials go to future patients. We must
balance future benefits against present risks.
17. 17
Behavioral and Social Science
Experiments
Many behavioral experiments rely on hiding the true purpose of the
study.
Subjects would change their behavior if told in advance what
investigators were looking for.
The “Ethical Principles” of the American Psychological Association
require consent unless a study only observes behavior in a public
space.