2. DISTRIBUTION REVIEW
• We use distribution to better understand what data looks like
• A police unit records speed of passing vehicles on a road with speed limit of 45
miles/hour
• Use speed.csv dataset on blackboard to find out the mean, standard deviation, and
histogram of the results
• For the histogram (in Stata) set the number of bins to 5, Y-axis should display frequency,
add height label to bars, change Range/value (major tick-label properties) of X-axis to
min 25, max 90 and delta of 5
3. POPULATION VS. SAMPLE
• Most of the time our questions are about the universe (population).
• For example:
• “do people who sleep more drink less coffee?”
• Or “are smaller class sizes associated with higher test scores?”
• Or “what is the relationship between price and age of residential real estate?”
• Unfortunately, we are not able to collect data from every single person, class size, or
house in the world
• We use samples of data to answer the question about the population
4. SOME OTHER QUESTIONS WE CAN ANSWER BY USING
SAMPLES
• What are the mean earnings of people in their 20s?
• What is the standard deviation of the earnings of people in their 20s?
• What is the variance of the earnings of people in their 20s?
• What is the average unemployment rate in New York city?
• What is the average height of a GU student?
5. PARAMETER, STATISTIC, AND ESTIMATOR
• When we want to find out the mean earnings (𝜇) of people in their 20s we are looking
for a population parameter
• When we collect answers from a sample of people in their 20s and calculate the average
earnings ( 𝑥) we have found the sample statistic
• We use sample statistics to estimate population parameters
• Sample statistics are estimators of population parameters
• In this specific example 𝑥 is an estimator of 𝜇
• Estimate – numerical value of the estimator when it is actually computed using data
from a specific sample
6. PARAMETER, STATISTIC, AND ESTIMATOR EXAMPLE
• In the regression of test scores on class size
• What is the general regression?
• What is the fitted regression?
• What is the population here?
• What is the sample?
• What is the parameter we are trying to estimate?
• What is the estimator of that parameter?
• What is the estimate?
• Difference between estimator and estimate?
7. IN CLASS
You run a regression of coffee consumption on the number of hours one sleeps at night.
And find that the coefficient on the number of hours slept at night is -0.75. Please choose
the answer that lists the following in the correct order: population in question; sample;
parameter; estimator; estimate
A. Everyone on the planet, the group of people you interview for your homework; 𝛽1; 1; -0.75
B. Everyone on the planet, the group of people you interview for your homework; 1; 𝛽1; -0.75
C. The group of people you interview for your homework, everyone on the planet; 𝛽1; 1; -0.75
D. Everyone on the planet, the group of people you interview for your homework; 1; -0.75; 𝛽1
8. REPEATED SAMPLES
• Suppose you are trying to evaluate an estimator by repeatedly drawing random samples
from a population
• Example 1.
• To estimate the relationship between hours slept and caffeine consumed you divided into 8
groups, collected data and ran 8 regressions. Here we have 8 estimates of the relationship. What
is the estimator here?
• Example 2.
• To estimate the average height of a GU student each of you asks 30 students how tall they are.
This way we will get 24 estimates of the average height of a GU student. What is the estimator
here?
• Repeated samples mean that the estimator is not one number but rather a series of
numbers. We are going to be working with its distribution.
9. RANDOM SAMPLE
• If we estimate the height of GU students by using the sample of people we meet outside
of basketball locker room we are not going to have a random sample
• If we estimate the unemployment rate by going to a park at 10 am on a weekday and
ask people we run into if they are employed we are not going to have a random sample
• I am trying to figure out the relationship between the number of DUIs in a state and
alcohol consumption. What sample should I use?
• Random sample means that everyone in the population has an equal chance of ending
up in the sample
10. ESTIMATOR PROPERTY 1. UNBIASEDNESS
• If you evaluate an estimator many times by using repeatedly drawn samples, it is
reasonable to hope that on average you would get the right answer.
• Example: we are trying to estimate the average height of a GU student. We collect data
from random samples 5 times and get the following results
Sample number Average height in the
sample
1 167.64
2 175.26
3 160.02
4 152.4
5 177.8
11. UNBIASEDNESS CONT’D
• What is the average number across the samples?
• If our estimator 𝑥 is unbiased then the average height of a GU student is about 5 feet 6
inches (166 cm)
• If you decided to estimate the average height in your sample by using the following
formula:
• 𝑖=1
𝑁
ℎ𝑒𝑖𝑔ℎ𝑡 𝑖
𝑁
+ 2
• you would have a biased estimator
12. ESTIMATOR PROPERTY 2. CONSISTENCY.
• The larger the sample the closer the estimator is to the population parameter
• Example: The larger the sample of GU students you survey about their height the closer
the estimate of the height will be to the actual average height of a GU student
13. ESTIMATOR PROPERTY 3. VARIANCE AND EFFICIENCY
• We want the estimator with the smaller variance. That estimator will be the most
efficient.
• Example: we are trying to estimate the average height of a GU student. We collect data
from random samples 5 times and get the following results
Sample number Average height in the
sample
Median height in the
sample
1 167.64 167.64
2 175.26 172.72
3 160.02 162.56
4 152.4 165.1
5 177.8 162.56
Variance 111.6 18.06
14. EFFICIENCY CONT’D
• Median appears to have a smaller variance, which means it is a more efficient estimator
of the average height of a GU student
15. REVIEW.
• Please explain unbiasedness, consistency, and variance by applying the ideas to the
example of the relationship between coffee consumed and hours slept.
• What is the difference between parameter, estimator and estimate?
Editor's Notes
Underline that the statistic is the estimator of the population value
You could have a different estimator, for example, you could say that the correlation coefficient is the estimator. Correct answer: B
For your project make sure the sample you collect is random. You will have to discuss why you think your sample is random. If the sample is random then the estimator will be close to the population parameter
The average height turns out to be 166.6 or about 5.6
Gonzaga university has about 7500 students. If you only estimate height of 4 students that sample size would not be enough.