DoW #6: TVs and Life Expectancies
For this week's DoW, you will explore the question:
Is there a relationship between life expectancy and the number of people per TV for a country?
The Excel file,
TV Life
contains data for the variables
Life Expectancy
and
People per TV
for a sample of 22 countries. We will analyze and interpret this data throughout this week’s investigations.
In Investigation 1
, you will
post your responses to Exercise B4 by Wednesday, 10PM EST
, and
follow-up by Friday, 10 PM EST
.
In Investigation 2,
you will
post your responses to Exercise E5 by Saturday 10 PM EST,
and
follow-up by Sunday, 10 PM EST.
Investigation 1: Measuring Association
In this investigation, we look at the concept of
association
– the relationship between the two variables – and ways to identify and measure the relationship in quantitative bivariate data. We will look at scatterplots and the correlation coefficient.
Inv 1, Activity A: Seeing the AssociationExercise A1
: Complete Annenberg Series for
Session 7, Parts A, B, and C.
(We will complete Part D in Investigation 2, but you can do it here if you prefer.)
Reflect on the following questions in your journal:
How does the contingency table (also called a two-way table) show the relationship seen in the scatter plot?
The height=armspan line is also called the
y=x
line (height is the
y-axis
variable, armspan is the
x-axis
variable). What does it mean if a point is above this line? below this line?
Exercise A2:
Analyze the data for DoW #6 in your calculator or on an applet. Record your answers in your journal:
What are the variables? Are they quantitative or categorical?
Create a scatterplot for the data in DoW #6, with the variable
People Per TV
on the x-axis.
Describe the relationship you see in the data (if any).
Are there any points on the scatterplot that do not seem to follow the general trend of the data? If so, what are they and why do they seem “different”?
Inv 1, Activity B: Describing Association
We use the term
association
to refer to a relationship between two variables which would reveal information about one variable from information about the other variable. In this investigation we will look at the association between two quantitative variables.
Associations can be positive or negative, and they can be strong or weak.
Two variables have a:
Positive association
If larger values of one variable tend to occur with larger values of the other variable. So, the two variables tend to increase (or decrease) together.
Negative association
If larger values of one variable tend to occur with smaller values of the other variable. So, as one variable tends to increase, the other tends to decrease.
Two variables have a:
Strong
associationI
I
f observations tend to closely follow the pattern (of positive association or of negative association). With a stronger association, one could more accurately use one variable to “predict” va.
DoW #6 TVs and Life ExpectanciesFor this weeks DoW, you wi.docx
1. DoW #6: TVs and Life Expectancies
For this week's DoW, you will explore the question:
Is there a relationship between life expectancy and the number
of people per TV for a country?
The Excel file,
TV Life
contains data for the variables
Life Expectancy
and
People per TV
for a sample of 22 countries. We will analyze and interpret this
data throughout this week’s investigations.
In Investigation 1
, you will
post your responses to Exercise B4 by Wednesday, 10PM EST
, and
follow-up by Friday, 10 PM EST
.
In Investigation 2,
you will
post your responses to Exercise E5 by Saturday 10 PM EST,
and
follow-up by Sunday, 10 PM EST.
2. Investigation 1: Measuring Association
In this investigation, we look at the concept of
association
– the relationship between the two variables – and ways to
identify and measure the relationship in quantitative bivariate
data. We will look at scatterplots and the correlation
coefficient.
Inv 1, Activity A: Seeing the AssociationExercise A1
: Complete Annenberg Series for
Session 7, Parts A, B, and C.
(We will complete Part D in Investigation 2, but you can do it
here if you prefer.)
Reflect on the following questions in your journal:
How does the contingency table (also called a two-way table)
show the relationship seen in the scatter plot?
The height=armspan line is also called the
y=x
line (height is the
y-axis
variable, armspan is the
x-axis
variable). What does it mean if a point is above this line?
below this line?
Exercise A2:
Analyze the data for DoW #6 in your calculator or on an
applet. Record your answers in your journal:
What are the variables? Are they quantitative or categorical?
3. Create a scatterplot for the data in DoW #6, with the variable
People Per TV
on the x-axis.
Describe the relationship you see in the data (if any).
Are there any points on the scatterplot that do not seem to
follow the general trend of the data? If so, what are they and
why do they seem “different”?
Inv 1, Activity B: Describing Association
We use the term
association
to refer to a relationship between two variables which would
reveal information about one variable from information about
the other variable. In this investigation we will look at the
association between two quantitative variables.
Associations can be positive or negative, and they can be strong
or weak.
Two variables have a:
Positive association
If larger values of one variable tend to occur with larger values
of the other variable. So, the two variables tend to increase (or
decrease) together.
4. Negative association
If larger values of one variable tend to occur with smaller
values of the other variable. So, as one variable tends to
increase, the other tends to decrease.
Two variables have a:
Strong
associationI
I
f observations tend to closely follow the pattern (of positive
association or of negative association). With a stronger
association, one could more accurately use one variable to
“predict” values of the other variable.
Weak association
if observations tend to follow the pattern more loosely. With a
weaker association, predictions may not be as accurate as they
would be with a strong association.
The table below shows the four combinations of positive and
negative, strong and weak associations as they might appear in
scatter plots, as well as one with nearly no association.
Exercise E1:
5. Return to your work for DoW #6
Add a best fit line
to the scatter plot you made in
Exercise A2.
Record the equation of this line. What do x and y represent in
this equation? Here's a
video tutorial
for doing this in excel.
Add a Least Squares Regression line
to the scatter plot you made in
Exercise A2
. Record the equation for this line. How does it compare to the
line you placed? A
video tutorial
for doing this in excel is also available.
Exercise E2:
What is the proportion of variability for Least Squares
Regression Line? Interpret this value in the context of the DoW.
Exercise E3:
In the US, there are 1.3 people per TV.
Use the Least Squares Regression line to
calculate
a prediction for the Life Expectancy in the US.
The actual Life Expectancy in the US is 75.5 years. What is the
error for the prediction you made in
6. E2
?
The error you calculated in
Exercise E3
is called the
Residual Error
for the prediction.
Residual Error = Actual Value – Predicted Value
This is the same “error” you looked at in Activity D, when you
found the SSE. (the SSE is a measure of the “total” error of the
prediction line. )
The plot you made in Exercise E3 is called a
Residual Plot.
This plot allows us to easily identify points that do not fit the
trend line. When looking a a distribution, we called
observations that fell far outside the expected pattern of
variation
outliers
. In Week 4, we discussed ways to identify outliers in a
distribution and considered removing them from the data in
order to better see the patterns in the variability. A residual plot
is a tool for identifying outliers in a scatter plot. Likewise, we
consider whether or not to remove such points from the plot.
It is also possible for a point to appear to be in line with the
trend of the data, but still be an outlier. Consider the point
circled in the scatter plot below:
7. If this point were to be removed, it liked would have a drastic
effect on the regression equation. It is called an
influential outlier.
Such points are far from the rest of the data, horizontally, and
likely would appear as outliers in the distribution for the x-
variable (using the test of 3 standard deviations from the mean).
Removing such points can greatly affect the regression line
(particularly the slope).
For
more information on outliers in a regression model,
see the website
.
Exercise E4:
Right-Click on the scatter plot for DoW #6 and select
Make Residual Plot
. A new plot will appear beneath the scatter plot, showing the
residual error for each data point.
Click on the two points in the
Residual Plot
that have the greatest Residual Errors. What countries are they?
What are the errors in the prediction?
Save the file under a new name. Then, delete these two values.
What happens to the correlation coefficient (r)? the best-fit
equation? The proportion of variation(r^2)?
Would you consider these points to be outliers? Do you think
we are justified in removing them from the data? Why or why
not? Are there any other outliers you might remove? Explain.
Optional:
Do you think there are any
influential outliers
8. for this regression? If so, try removing them and note how the
regression equation and proportion of variation change. Also,
consider whether or not there is a reason for removing them
(aside from the fact they affect the regression.)
The following links will take you to two tutorials for creating
residual plots.
The first is for
a graphing calculator
.
The second
for Excel
.
Exercise E5:
Return to our original question for DoW #6:
Is there a relationship between life expectancy and the number
of people per TV for a country?
Consider the analyses you have completed for the DoW in
Investigations 1 and 2. Now it’s time to interpret the analyses.
How does the data answer this question? How do the tools
we’ve used support this answer? What questions do you have
about this analysis?
Write at least three summary statements interpreting DoW#6,
supported with facts from the data and analyses.
Post these three statements (as well as any additional thoughts
9. or questions you have on the DoW) to your group’s DB by
Saturday, 10 PM EST.