Correlation and regression

Introduction toIntroduction to
Correlation and RegressionCorrelation and Regression
Ginger Holmes Rowell, Ph. D.Ginger Holmes Rowell, Ph. D.
Associate Professor of MathematicsAssociate Professor of Mathematics
Middle Tennessee State UniversityMiddle Tennessee State University

OutlineOutline
 IntroductionIntroduction
 Linear CorrelationLinear Correlation
 RegressionRegression
 Simple LinearSimple Linear
RegressionRegression
 Using the TI-83Using the TI-83
 Model/FormulasModel/Formulas

Outline continuedOutline continued
 ApplicationsApplications
 Real-life ApplicationsReal-life Applications
 Practice ProblemsPractice Problems
 Internet ResourcesInternet Resources
 AppletsApplets
 Data SourcesData Sources

CorrelationCorrelation
 CorrelationCorrelation
 A measure of association betweenA measure of association between
two numerical variables.two numerical variables.
 Example (positive correlation)Example (positive correlation)
 Typically, in the summer as theTypically, in the summer as the
temperature increases people aretemperature increases people are
thirstier.thirstier.

Specific ExampleSpecific Example
For sevenFor seven
random summerrandom summer
days, a persondays, a person
recorded therecorded the
temperaturetemperature andand
theirtheir waterwater
consumptionconsumption,,
during a three-hourduring a three-hour
period spentperiod spent
outside.outside.
Temperature
(F)
Water
Consumption
(ounces)
75 16
83 20
85 25
85 27
92 32
97 48
99 48

How would you describe the graph?How would you describe the graph?

How “strong” is the linear relationship?How “strong” is the linear relationship?

Measuring the RelationshipMeasuring the Relationship
Pearson’s SamplePearson’s Sample
Correlation Coefficient,Correlation Coefficient, rr
measures themeasures the directiondirection and theand the
strengthstrength of the linear associationof the linear association
between two numerical pairedbetween two numerical paired
variables.variables.

Direction of AssociationDirection of Association
Positive CorrelationPositive Correlation NegativeNegative
CorrelationCorrelation

Strength of Linear AssociationStrength of Linear Association
r
value
Interpretation
1
perfect positive linear
relationship
0 no linear relationship
-1
perfect negative linear
relationship

Strength of Linear AssociationStrength of Linear Association

Other Strengths of AssociationOther Strengths of Association
r value Interpretation
0.9 strong association
0.5 moderate association
0.25 weak association

Other Strengths of AssociationOther Strengths of Association

FormulaFormula
= the sum
n = number of paired
items
xi
= input variable yi
= output variable
x = x-bar = mean of
x’s
y = y-bar = mean of
y’s
sx
= standard deviation
of x’s
sy
= standard
deviation of y’s
sum

RegressionRegression
 Specific statistical methodsSpecific statistical methods forfor
finding the “line of best fit” for onefinding the “line of best fit” for one
response (dependent) numericalresponse (dependent) numerical
variable based on one or morevariable based on one or more
explanatory (independent)explanatory (independent)
variables.variables.

Curve Fitting vs. RegressionCurve Fitting vs. Regression
 Includes using statistical methodsIncludes using statistical methods
to assess the "goodness of fit" ofto assess the "goodness of fit" of
the model. (ex. Correlationthe model. (ex. Correlation
Coefficient)Coefficient)

Regression: 3 Main PurposesRegression: 3 Main Purposes
 To describeTo describe (or model)(or model)
 To predictTo predict ((or estimate)or estimate)
 To controlTo control (or administer)(or administer)

Simple Linear RegressionSimple Linear Regression
 Statistical method for findingStatistical method for finding
 the “line of best fit”the “line of best fit”
 for one response (dependent)for one response (dependent)
numerical variablenumerical variable
 based on one explanatorybased on one explanatory
(independent) variable.(independent) variable.

Least Squares RegressionLeast Squares Regression
 GOALGOAL --
minimize theminimize the
sum of thesum of the
square ofsquare of
the errors ofthe errors of
the datathe data
points.points.
This minimizes theThis minimizes the Mean Square ErrorMean Square Error

ExampleExample
 Plan an outdoor party.Plan an outdoor party.
 EstimateEstimate number of soft drinks tonumber of soft drinks to
buy per person, based on how hotbuy per person, based on how hot
the weather is.the weather is.
 Use Temperature/Water data andUse Temperature/Water data and
regressionregression..

Steps to Reaching a SolutionSteps to Reaching a Solution
 Draw a scatterplot of the data.Draw a scatterplot of the data.

 Visually, consider the strength of theVisually, consider the strength of the
linear relationship.linear relationship.

 If the relationship appears relativelyIf the relationship appears relatively
strong, find the correlation coefficientstrong, find the correlation coefficient
as a numerical verification.as a numerical verification.

 If the relationship appears relativelyIf the relationship appears relatively
strong, find the correlation coefficientstrong, find the correlation coefficient
as a numerical verification.as a numerical verification.
 If the correlation is still relativelyIf the correlation is still relatively
strong, then find the simple linearstrong, then find the simple linear
regression line.regression line.

Our Next StepsOur Next Steps
 Learn to Use the TI-83 forLearn to Use the TI-83 for
Correlation and Regression.Correlation and Regression.
 Interpret the Results (in theInterpret the Results (in the
Context of the Problem).Context of the Problem).

Finding the Solution: TI-83Finding the Solution: TI-83
 Using the TI- 83 graphing calculatorUsing the TI- 83 graphing calculator
 Turn on the calculator diagnostics.Turn on the calculator diagnostics.
 Enter the data.Enter the data.
 Graph a scatterplot of the data.Graph a scatterplot of the data.
 Find the equation of the regression lineFind the equation of the regression line
and the correlation coefficient.and the correlation coefficient.
 Graph the regression line on a graphGraph the regression line on a graph
with the scatterplot.with the scatterplot.

Preliminary StepPreliminary Step
 Turn the Diagnostics On.Turn the Diagnostics On.
 PressPress 2nd 02nd 0 (for Catalog).(for Catalog).
 Scroll down toScroll down to DiagnosticOnDiagnosticOn. The. The
marker points to the right of themarker points to the right of the
words.words.
 PressPress ENTERENTER. Press. Press ENTERENTER
again.again.
 The wordThe word DoneDone should appear onshould appear on
the right hand side of the screen.the right hand side of the screen.

ExampleExample
Temperature
(F)
Water
Consumption
(ounces)
75 16
83 20
85 25
85 27
92 32
97 48
99 48

1. Enter the Data into Lists1. Enter the Data into Lists
 PressPress STATSTAT..
 UnderUnder EDITEDIT, select, select 1: Edit1: Edit..
 Enter x-values (input) intoEnter x-values (input) into L1L1
 Enter y-values (output) intoEnter y-values (output) into L2L2..
 After data is entered in the lists, goAfter data is entered in the lists, go
toto 2nd MODE2nd MODE to quit and return toto quit and return to
the home screen.the home screen.
 Note:Note: If you need to clear out a list,If you need to clear out a list,
for example list 1, place the cursor onfor example list 1, place the cursor on
L1 then hit CLEAR and ENTER .L1 then hit CLEAR and ENTER .

2. Set up the Scatterplot.2. Set up the Scatterplot.
 PressPress 2nd Y=2nd Y= (STAT PLOTS).(STAT PLOTS).
 SelectSelect 1: PLOT 11: PLOT 1 and hitand hit ENTERENTER..
 Use the arrow keys to move theUse the arrow keys to move the
cursor down tocursor down to OnOn and hitand hit ENTERENTER..
 Arrow down toArrow down to Type:Type: and select theand select the
first graphfirst graph under Type.under Type.
 UnderUnder Xlist:Xlist: EnterEnter L1L1..
 UnderUnder Ylist:Ylist: EnterEnter L2L2..
 UnderUnder Mark:Mark: select any of these.select any of these.

3. View the Scatterplot3. View the Scatterplot
 PressPress 2nd MODE2nd MODE to quit andto quit and
return to the home screen.return to the home screen.
 To plot the points, pressTo plot the points, press ZOOMZOOM
and selectand select 9: ZoomStat9: ZoomStat..
 The scatterplot will then beThe scatterplot will then be
graphed.graphed.

4. Find the regression line.4. Find the regression line.
 PressPress STATSTAT..
 PressPress CALCCALC..
 SelectSelect 4: LinReg(ax + b)4: LinReg(ax + b)..
 PressPress 2nd 12nd 1 (for List 1)(for List 1)
 Press thePress the comma keycomma key,,
 PressPress 2nd 22nd 2 (for List 2)(for List 2)
 PressPress ENTERENTER..

5. Interpreting and Visualizing5. Interpreting and Visualizing
 Interpreting the result:Interpreting the result:
y = ax + by = ax + b
 The valueThe value ofof aa is theis the slopeslope
 The value ofThe value of bb is theis the y-intercepty-intercept
 rr is theis the correlation coefficientcorrelation coefficient
 rr22
is theis the coefficient of determinationcoefficient of determination

5. Interpreting and Visualizing5. Interpreting and Visualizing
 Write down the equation of theWrite down the equation of the
line in slope intercept form.line in slope intercept form.
 PressPress Y=Y= and enter the equationand enter the equation
under Y1. (Clear all otherunder Y1. (Clear all other
equations.)equations.)
 PressPress GRAPHGRAPH and the line willand the line will
be graphed through the databe graphed through the data
points.points.

Interpretation in ContextInterpretation in Context
 Regression Equation:Regression Equation:
y=1.5*x - 96.9y=1.5*x - 96.9
Water Consumption =Water Consumption =
1.5*Temperature - 96.91.5*Temperature - 96.9

 Slope = 1.5 (ounces)/(degrees F)Slope = 1.5 (ounces)/(degrees F)
 for each 1 degree F increase infor each 1 degree F increase in
temperature, you expect an increasetemperature, you expect an increase
of 1.5 ounces of water drank.of 1.5 ounces of water drank.

y-intercept = -96.9y-intercept = -96.9
 For this example,For this example,
when the temperature is 0 degrees F,when the temperature is 0 degrees F,
then a person would drink about -97then a person would drink about -97
ounces of water.ounces of water.
 That does not make any sense!That does not make any sense!
 Our model is not applicable for x=0.Our model is not applicable for x=0.

Prediction ExamplePrediction Example
 PredictPredict the amount ofthe amount of
water a person would drink when thewater a person would drink when the
temperature istemperature is 95 degrees F.95 degrees F.
 Solution:Solution: Substitute the value of x=95Substitute the value of x=95
(degrees F) into the regression equation(degrees F) into the regression equation
and solve for y (water consumption).and solve for y (water consumption).
If x=95, y=1.5*95 - 96.9 =If x=95, y=1.5*95 - 96.9 = 45.6 ounces.45.6 ounces.

Strength of the Association:Strength of the Association: rr22
 Coefficient of Determination –Coefficient of Determination – rr22
 General Interpretation:General Interpretation: TheThe
coefficient of determination tells thecoefficient of determination tells the
percent of the variationpercent of the variation in thein the
response variable that isresponse variable that is explainedexplained
(determined) by the model and the(determined) by the model and the
explanatory variable.explanatory variable.

Interpretation ofInterpretation of rr22
 Example:Example: rr22
=92.7%.=92.7%.
 Interpretation:Interpretation:
 Almost 93% of the variability in theAlmost 93% of the variability in the
amount of water consumed isamount of water consumed is
explained by outside temperatureexplained by outside temperature
using this model.using this model.
 Note: Therefore 7% of theNote: Therefore 7% of the
variation in the amount of watervariation in the amount of water
consumed is not explained by thisconsumed is not explained by this
model using temperature.model using temperature.

Simple Linear Regression ModelSimple Linear Regression Model
The model forThe model for
simple linear regression issimple linear regression is
There are mathematical assumptions behindThere are mathematical assumptions behind
the concepts thatthe concepts that
we are covering today.we are covering today.

FormulasFormulas
Prediction Equation:Prediction Equation:

Real Life ApplicationsReal Life Applications
Cost Estimating for Future SpaceCost Estimating for Future Space
Flight Vehicles (MultipleFlight Vehicles (Multiple
Regression)Regression)

Nonlinear ApplicationNonlinear Application
Predicting when Solar Maximum WillPredicting when Solar Maximum Will
OccurOccur
http://science.msfc.nasa.gov/ssl/pad/http://science.msfc.nasa.gov/ssl/pad/
solar/predict.htmsolar/predict.htm

 Estimating Seasonal Sales forEstimating Seasonal Sales for
Department Stores (Periodic)Department Stores (Periodic)

 Predicting Student Grades BasedPredicting Student Grades Based
on Time Spent Studyingon Time Spent Studying

 . . .. . .
 What ideas can you think of?What ideas can you think of?
 What ideas can you think of thatWhat ideas can you think of that
your students will relate to?your students will relate to?

Practice ProblemsPractice Problems
 Measure Height vs. Arm SpanMeasure Height vs. Arm Span
 Find line of best fit for height.Find line of best fit for height.
 Predict height forPredict height for
one student not inone student not in
data set. Checkdata set. Check
predictability of model.predictability of model.

 Is there any correlation betweenIs there any correlation between
shoe size and height?shoe size and height?
 Does gender make a differenceDoes gender make a difference
in this analysis?in this analysis?

 Can the number of pointsCan the number of points
scored in a basketball game bescored in a basketball game be
predicted bypredicted by
 The time a player plays inThe time a player plays in
the game?the game?
 By the player’s height?By the player’s height?
Idea modified from Steven King, Aiken,Idea modified from Steven King, Aiken,
SC. NCTM presentation 1997.)SC. NCTM presentation 1997.)

ResourcesResources
 Data Analysis and StatisticsData Analysis and Statistics..
Curriculum and EvaluationCurriculum and Evaluation
Standards for School Mathematics.Standards for School Mathematics.
Addenda Series, Grades 9-12.Addenda Series, Grades 9-12.
NCTM. 1992.NCTM. 1992.
 Data and Story LibraryData and Story Library. Internet. Internet
Website.Website.
http://lib.stat.cmu.edu/DASL/http://lib.stat.cmu.edu/DASL/
2001.2001.

Internet ResourcesInternet Resources
 CorrelationCorrelation
 Guessing CorrelationsGuessing Correlations - An- An
interactive site that allows you tointeractive site that allows you to
try to match correlation coefficientstry to match correlation coefficients
to scatterplots. University of Illinois,to scatterplots. University of Illinois,
Urbanna Champaign StatisticsUrbanna Champaign Statistics
Program.Program.
http://www.stat.uiuc.edu/~stat100/java/guhttp://www.stat.uiuc.edu/~stat100/java/gu

 Effects of adding anEffects of adding an
OutlierOutlier..
W. West, University of SouthW. West, University of South
Carolina.Carolina.
http://www.stat.sc.edu/~west/javahtml/Rhttp://www.stat.sc.edu/~west/javahtml/R

 Estimate the Regression LineEstimate the Regression Line..
Compare the mean square errorCompare the mean square error
from different regression lines. Canfrom different regression lines. Can
you find the minimum mean squareyou find the minimum mean square
error? Rice University Virtualerror? Rice University Virtual
Statistics Lab.Statistics Lab.
http://www.ruf.rice.edu/~lane/stat_sim/reghttp://www.ruf.rice.edu/~lane/stat_sim/reg

Internet Resources: Data SetsInternet Resources: Data Sets
 Data and Story Library.Data and Story Library.
Excellent source for small data sets.Excellent source for small data sets.
Search for specific statistical methodsSearch for specific statistical methods
(e.g. boxplots, regression) or for data(e.g. boxplots, regression) or for data
concerning a specific field of interestconcerning a specific field of interest
(e.g. health, environment, sports).(e.g. health, environment, sports).
http://lib.stat.cmu.edu/DASL/http://lib.stat.cmu.edu/DASL/

Internet Resources: Data SetsInternet Resources: Data Sets
 FEDSTATS.FEDSTATS. "The gateway to"The gateway to
statistics from over 100 U.S. Federalstatistics from over 100 U.S. Federal
agencies"agencies" http://www.fedstats.gov/http://www.fedstats.gov/
 "Kid's Pages.""Kid's Pages." (not all related to(not all related to
statistics)statistics)
http://www.fedstats.gov/kids.htmlhttp://www.fedstats.gov/kids.html

 OtherOther
 Statistics Applets. Using WebStatistics Applets. Using Web
Applets to Assist in StatisticsApplets to Assist in Statistics
Instruction. Robin Lock, St.Instruction. Robin Lock, St.
Lawrence University.Lawrence University.
http://it.stlawu.edu/~rlock/maa99/http://it.stlawu.edu/~rlock/maa99/

 OtherOther
 Ten Websites Every StatisticsTen Websites Every Statistics
Instructor Should Bookmark.Instructor Should Bookmark.
Robin Lock, St. LawrenceRobin Lock, St. Lawrence
University.University.
http://it.stlawu.edu/~rlock/10sites.htmhttp://it.stlawu.edu/~rlock/10sites.htm

For More Information…For More Information…
On-line version of this presentationOn-line version of this presentation
http://www.mtsu.edu/~statshttp://www.mtsu.edu/~stats
/corregpres/index.html/corregpres/index.html
More information about regressionMore information about regression
VisitVisit STATS @ MTSUSTATS @ MTSU web siteweb site
http://www.mtsu.edu/~statshttp://www.mtsu.edu/~stats

Correlation and regression

More Related Content

What's hot

Viewers also liked

Similar to Correlation and regression

Recently uploaded

Correlation and regression