Lesson 4

Hanze University of Applied Sciences

### Lesson 4

Hanze University of Applied Science Groningen
Ning Ding, PhD
Lecturer of International Business School (IBS)
n.ding@pl.hanze.nl
What we are going to learn?
• Chapter 12: Simple Regression and Correlation 
  – dependent / independent variables 
  – scatter diagrams 
  – regression analysis 
  – Least-squares estimating equation 
  – the coefficient of determination 
  – the coefficient of correlation
Review
What is the interquartile range?
a. 98 b. 1764 c. 854 d.484 e.1940 f.2038

Interquartile Range = Q3-Q1 = 2205-1721 = 484
Review
L=(8+1)*25%=2.25
Interquartile Range = 274.5-133.5 = 141
Q1=133.5
L=(8+1)*75%=6.75
Q3=274.5
5. 5. Median Quartile Decile Percentile 1 1 1st D 2 2 Q1=2 2 2 4 4 Interquartile 5 5 Range 7 7 8 8 Q3=8.5 9 9 12 12 9th DBoxplot How to interpret? http://cnx.org/content/m11192/latest/
6. 6. Review a. Positive b. Negative c. Symmetrical d. No idea a b Mean= € 450€ 20 € 2000 Q1= € 250 Median= € 350 Q3= € 850 The distribution is skewed to __________ because the mean is the right larger than __________the median. http://cnx.org/content/m11192/latest/
7. 7. 0.81.0 Mean > Median1.01.21.21.31.51.72.02.02.12.2 2.04.0 Mean < Median 3.2 Positively skewed 3.6 3.7 4.0 4.2 4.2 4.5 4.5 4.6 4.8http://qudata.com/online/statcalc/ 5.0 Negatively skewed 5.0
8. 8. • Review This means that the data is• Chapter 12: symmetrically distributed.Simple Regressionand Correlation–dependent /independentvariables–scatter diagrams–regression analysis–Least-squaresestimating equation–the coefficient ofdetermination–the coefficient ofcorrelation Zero skewness mode=median=mean
Regression Analysis
– scatter diagrams
– dependent / independent variables
– regression analysis
– Least-squares estimating equation
– the coefficient of determination
– the coefficient of correlation
Scatter Diagram
How to determine both the nature and the strength of a relationship between variables.
Scatter Diagram
Scatter Diagram: Streudiagramm, Puntenwolk, 散布图

Positive correlation
12. 12. Scatter Diagram• Review• Chapter 12:Simple Regressionand Correlation–dependent /independentvariables–scatter diagrams–regression analysis–Least-squaresestimating equation–the coefficient ofdetermination–the coefficient ofcorrelation Negative correlation
13. 13. catter DiagramDiagram Scatter Examples• Review• Chapter 12:Simple Regressionand Correlation–dependent /independentvariables–scatter diagrams–regression analysis–Least-squaresestimating equation–the coefficient ofdetermination–the coefficient ofcorrelation No correlation
Scatter Diagram
Scatter Diagrams:
• Patterns indicating that the variables are related
• If related, we can describe the relationship

Strong & Positive correlation | Weak & Positive correlation
No correlation
Weak & Negative correlation | Strong & Negative correlation
Dependent/Independent Variables
Variables:
– Independent variables: known
– Dependent variables: to predict

Dependent Variable

Independent Variable
Regression Analysis - Correlation & Cause Effect?
• The relationships found by regression to be relationships of association
• Not necessarilly of cause and effect.
17. 17. Regression Analysis• Review• Chapter 12:Simple Regressionand Correlation–dependent /independentvariables–scatter diagrams–regression analysis–Least-squaresestimating equation–the coefficient ofdetermination–the coefficient ofcorrelation
Least-squares Estimating Equation
Least-squares estimating equation:
• The dependent variable Y is determined by the independent variable X

Dependent Variable
Y
X
Independent Variable
Ŷ = a + bX
Least-squares Estimating Equation

Ŷ = a + bX
Least-squares Estimating Equation

b = (Σxy - nX̄Ȳ) / (Σx² - nX̄²)

Ŷ = a + bX
a = Ȳ - bX̄
Least-squares Estimating Equation
The relationship between the age of a truck and the annual repair expense?

b = (Σxy - nX̄Ȳ) / (Σx² - nX̄²)
Ŷ = a + bX
a = Ȳ - bX̄

Step 1: X̄=3 Ȳ=6
Step 2: b = (78 - 4*3*6) / (44 - 4*9) = 0.75
Step 3: a = 6 - 0.75*3 = 3.75
Step 4: Ŷ = 3.75 + 0.75X
Step 5: If the city has a truck that is 4 years old, 6.75 = 3.75 + 0.75*4
Step 6: the director could use the equation to predict \$675 annually in repairs.
Least-squares Estimating Equation
To find the simple/linear regression of Personal Income (X) and Auto Sales (Y)

If X=64, what about Y?

Step 1: Count the number of values. N = 5
Step 2: Find XY, X² See the below table

a. 4.1 b. 5.3 c. 6.7 d. 7.4 e. 7.5 f. 8.2
Least-squares Estimating Equation

Step 3: Find ΣX, ΣY, ΣXY, ΣX².
ΣX = 311 Mean = 62.2
ΣY = 18.6 Mean = 3.72
ΣXY = 1159.7
ΣX² = 19359

Step 4: b = (Σxy - nX̄Ȳ) / (Σx² - nX̄²)
Substitute in the above slope formula given.
Slope(b) = (1159.7-5*62.2*3.72) / (19359-5*62.2*62.2) = 0.19
Least-squares Estimating Equation
Slope(b) = 0.19
Step 5: Now, again substitute in the above intercept formula given.
Intercept(a) = Ȳ - bX̄ = 3.72- 0.19 * 62.2= -8.098

Step 6: Then substitute these values in regression equation formula
Regression Equation(Ŷ) = a + bX
Ŷ = -8.098 + 0.19X

Regression Equation: Ŷ = a + bX
Suppose if we want to know the approximate y value for the variable X = 64. Then we can substitute the value in the above equation.
= -8.098 + 0.19(64) = -8.098 + 12.16 = 4.06
Standard Error
Standard Error: to minimize the sum of the squares of the errors to measure the goodness of fit of a line

SE
ei = residuali

Strong correlation | Weak correlation
Standard Error

ei = residuali
Coefficient of Determination
Correlation Analysis: describe the degree to which one variable is linearly related to another.

Coefficient of Determination: r²
Measure the extent, or strength, of the association that exists between two variables.

Coefficient of Correlation: r
Square root of coefficient of determination
Coefficient of Determination
Coefficient of Determination: r²
• 0 ≤ r² ≤ 1.
• The larger r², the stronger the linear relationship.
• The closer r² is to 1, the more confident we are in our prediction.

r²=0.9984
Coefficient of Determination

• 76.30% of Sales changes is explained by GDP changes. The rest 23.70% is explained by other variables.

r²=0.7630
Coefficient of Correlation
Coefficient of correlation: r
• r ≤ 0.3 Weak Correlation
• 0.3 ≤ r ≤ 0.7 Moderate Correlation
• r ≥ 0.7 Strong Correlation
• r = 1.0 Perfect Correlation

r²=0.1132
r=0.1064
Coefficient of Correlation
• There is a positive and weak correlation between GDP and Envy Rides' annual sales.
• 11.32% of Sales changes is explained by GDP changes. The rest 88.68% is explained by other variables.

r²=0.1132
r=0.1064
Coefficient of Correlation
• There is a positive and strong correlation between GDP and Envy Rides' annual sales.
• 76.30% of Sales changes is explained by GDP changes. The rest 23.70% is explained by other variables.

r²=0.7630
r=0.8735
Coefficient of Correlation
• There is a positive and almost perfect correlation between GDP and Envy Rides' annual sales.
• 99.84% of Sales changes is explained by GDP changes. The rest 8% is explained by other variables.

r²=0.9984
r=0.9992
Review

Which value of r indicates a stronger correlation than 0.40?
A. -0.30
B. -0.50
C. +0.38
D. 0

If all the plots on a scatter diagram lie on a straight line, what is the standard error of estimate?
A. -1
B. +1
C. 0
D. Infinity
Review

In the least squares equation, Ŷ = 10 + 20X the value of 20 indicates
A. the Y intercept.
B. for each unit increase in X, Y increases by 20.
C. for each unit increase in Y, X increases by 20.
D. none of these.
Review
A sales manager for an advertising agency believes there is a relationship between the number of contacts and the amount of the sales. To verify this belief, the following data was collected:

What is the Y-intercept of the linear equation?
A. -12.201
B. 2.1946
C. -2.1946
D. 12.201
What we have learnt?
– scatter diagrams
– dependent / independent variables
– regression analysis
– Least-squares estimating equation
– the coefficient of determination
– the coefficient of correlation