Mrs.P.Kalaiselvi, M.Sc.,,M.A.,
Ms.S.Swathi Sundari, M.Sc.,M.Phil.,
Assumptions :
1. The sample of paired data (x,y) is a
random sample.
2. The pairs of (x,y) data are normally
distributed.
 Measures the degree of linear association
between two scaled variables analysis of the
relationship between two quantitative outcomes,.
 Scatterplot (or scatter diagram)
is a graph in which the paired
(x,y) sample data are plotted with
a horizontal x axis and a vertical
y axis. Each individual (x,y) pair
is plotted as a single point.
 POSITIVE CORRELATION
Examples :
 Height and weight of a batch of students
 Income and expenditure of a family
 NEGATIVE CORRELATION
Examples :
 Price and demand
 Volume v and pressure p of a perfect gas
Scatter Diagram of Paired Data
Restaurant Bill (x) and Tip (y)
x x
yy y
x
Sample Scatter Plots showing various degrees of “positive”
correlation. That is, when x increases y also increases
(a) Positive (b) Strong
positive
(c) Perfect
positive
x x
yy y
x
(d) Negative (e) Strong
negative
(f) Perfect
negative
Sample Scatter Plots showing various degrees of
“negative” correlation. When x increases y decreases
x x
yy
(g) No Correlation (h) Nonlinear Correlation
Sample Scatter Plots showing NO linear correlation. The first plot (g)
has no correlation of any kind. The other plot (h) shows a clear
correlation between x and y, but the correlation is NOT LINEAR.
Nonlinear correlation can be studied, but it is beyond this class. We
will only study LINEAR CORRELATION.
 The correlation coefficient is independent of the
change of origin and scale.
 The values of the linear correlation coefficient are ALWAYS
between -1 and +1.
 If the correlation coefficient = 1 then the correlation
is perfect and positive.
 If the correlation coefficient = -1 then the correlation
is perfect and negative.
 If the correlation coefficient = 0 then the variables
are uncorrelated.
 If the variables x and y are uncorrelated then
Cov (x,y) = 0.
The Linear Correlation Coefficient r,
can be written as
  
       
2 22 2
n xy x y
r
n x x n y y


 
  
   
 Correlation coefficient is the geometric mean
between the regression coefficients.
 If one of the regression coefficient is greater than
unity the other is less than unity.
 Arithmetic mean of the regression coefficients is
greater than or equal to the correlation coefficient.
 Regression coefficients are independent of the
change of origin but dependent on change of scale.
Regression Line Plotted on Scatter Plot
The Regression Line is the line of “best fit”
through the data points. “Best fit” means the
sum of the vertical distances between each
data point and the regression line is
minimized.
 x is the independent variable (predictor
variable)
 y-hat is the dependent variable
(response variable)
0 1
ˆy b b x 
b0 is the y-
intercept or the
value at which
the regression
line crosses the
vertical axis
b1 is the slope of
the regression line
or the amount of
change in y for
every 1 unit change
in x
y-hat is the
“dependent” or
“response” variable
because it depends
on, or responds to
the value of x
x is the “independent”
or “predictor” variable
because it acts
independently to
predict the value of y-
hat
Statistics
Statistics

Statistics

  • 1.
  • 2.
    Assumptions : 1. Thesample of paired data (x,y) is a random sample. 2. The pairs of (x,y) data are normally distributed.
  • 3.
     Measures thedegree of linear association between two scaled variables analysis of the relationship between two quantitative outcomes,.
  • 4.
     Scatterplot (orscatter diagram) is a graph in which the paired (x,y) sample data are plotted with a horizontal x axis and a vertical y axis. Each individual (x,y) pair is plotted as a single point.
  • 5.
     POSITIVE CORRELATION Examples:  Height and weight of a batch of students  Income and expenditure of a family  NEGATIVE CORRELATION Examples :  Price and demand  Volume v and pressure p of a perfect gas
  • 7.
    Scatter Diagram ofPaired Data Restaurant Bill (x) and Tip (y)
  • 8.
    x x yy y x SampleScatter Plots showing various degrees of “positive” correlation. That is, when x increases y also increases (a) Positive (b) Strong positive (c) Perfect positive
  • 9.
    x x yy y x (d)Negative (e) Strong negative (f) Perfect negative Sample Scatter Plots showing various degrees of “negative” correlation. When x increases y decreases
  • 10.
    x x yy (g) NoCorrelation (h) Nonlinear Correlation Sample Scatter Plots showing NO linear correlation. The first plot (g) has no correlation of any kind. The other plot (h) shows a clear correlation between x and y, but the correlation is NOT LINEAR. Nonlinear correlation can be studied, but it is beyond this class. We will only study LINEAR CORRELATION.
  • 11.
     The correlationcoefficient is independent of the change of origin and scale.  The values of the linear correlation coefficient are ALWAYS between -1 and +1.  If the correlation coefficient = 1 then the correlation is perfect and positive.  If the correlation coefficient = -1 then the correlation is perfect and negative.  If the correlation coefficient = 0 then the variables are uncorrelated.  If the variables x and y are uncorrelated then Cov (x,y) = 0.
  • 12.
    The Linear CorrelationCoefficient r, can be written as            2 22 2 n xy x y r n x x n y y           
  • 14.
     Correlation coefficientis the geometric mean between the regression coefficients.  If one of the regression coefficient is greater than unity the other is less than unity.  Arithmetic mean of the regression coefficients is greater than or equal to the correlation coefficient.  Regression coefficients are independent of the change of origin but dependent on change of scale.
  • 15.
    Regression Line Plottedon Scatter Plot The Regression Line is the line of “best fit” through the data points. “Best fit” means the sum of the vertical distances between each data point and the regression line is minimized.
  • 16.
     x isthe independent variable (predictor variable)  y-hat is the dependent variable (response variable) 0 1 ˆy b b x  b0 is the y- intercept or the value at which the regression line crosses the vertical axis b1 is the slope of the regression line or the amount of change in y for every 1 unit change in x y-hat is the “dependent” or “response” variable because it depends on, or responds to the value of x x is the “independent” or “predictor” variable because it acts independently to predict the value of y- hat