SlideShare a Scribd company logo
1 of 33
Aaker, Kumar, Day (5th Edition)
Correlation & Regression
Regression
What is
Regression
Regression
What is
Regression
A Statistical Technique that is used to relate two or
more variables.
Use the independent variable(s) to predict the value of
dependent variable.
Objective
Example
For a given value of advertisement expenditure, how
much sales will be generated.
With a given diet plan, how much weight an individual
will be able to reduce.
With a given diet plan, how much weight an individual
will be able to reduce.
Regression Understanding
A layman
Question
Suppose we want to find out how much the age of the
car helps you to determine the price of the car
The older the car ______ will be the priceA layman Answer
Regression in
Simple Words
As the age of the car increases by one year the price of
the car is estimated to decrease by a certain amount.
Y(Estimated) = b0 + b1 X
Regression in
Statistical Terms
Regression Understanding
Data Set: Age &
Price of the Cars
A Negative Relationship
What Relation Do
you see?
Age 1 2 1 2 3 4 3 4 3
Price 90 85 93 84 80 74 81 76 79
A Convenient
Way to Look
(What is this tool
Called?)
Price
Age
70
80
90
1 2 3 4
Price
Age
70
80
90
1 2 3 4
HowtoShowit
Statistically
Y (E) = b0 + b1 X
Y (E) = 97 – 5 X
Y = 97 – 5 X +E
Term
Y (E)
X
b0
b1
What it is!
Dependent Variable whose behavior is to be determined
Independent Variable whose effect to be determined
Intercept: Value of Y(E) when X = 0
Estimated Change in Y in response to unit Change in X
E Difference between the actual and estimated
Assessing the Goodness of Fit: Graphical Way
Goodness of
Fit Means
How well the model fits the actual data. Less residual
means a good bit, more residual means bad Fit
Bad Fit Good Fit Perfect Fit
Assessing the Goodness of Fit: Statistical Way
Expected Y
Estimated YActual Y
SSR
SSR =Σ (Estimated – Expected)2
SST
SST =Σ (Real – Expected)2
SSE
SSE =Σ (Actual – Expected)2
Assessing the Goodness of Fit: Statistical Way R2
SST =Σ (Real – Expected)2
SSR =Σ (Estimated – Expected)2
SSE =Σ (Actual – Expected)2
A good Model is the one in
which SSE is the lowest
SSE = 0
SST = SSR + SSE R2 = SSR/SST R2 = 1 - SSE/SST
Inferring About the Population
Assumptions
Expected Value
of Residual
Variance of
Residual
Distribution of
Residual
Dependency of
Residuals
E(ei ) = 0
σe1= σe2= …. = σei
Normal
Independent
What it means
No apparent pattern in residual plot
Residual Plot has consistent Spread
Histogram is symmetric or normal
(Histogram & Probability Plot of Residual)
Relationship
b/w IndV & DV
Linear Linear Scatter Plot
How to Check it
The Three Conditions Shown Together
As the distribution is symmetric, the
mean distribution of error term will
be zero
The distribution of error term is
shown to be normally distributed
Variance of error term for different
values of x appear to be same
Analysis of Residuals
If the assumptions of regression are met, the following two
conditions are met
Cond1: Plot of residuals (e) against predictor (x) should fall
roughly in a horizontal band & symmetric about x-axis
Cond2: A normal probability plot of the residuals should be
roughly linear
16
Residual Analysis
 Examining the residuals (or standardized residuals), help
detect violations of the required conditions.
 Example – continued:
 Nonnormality.
 Use Excel to obtain the standardized residual histogram.
 Examine the histogram and look for a bell shaped. diagram with a mean
close to zero.
17
For each residual we calculate
the standard deviation as follows:
2
x
2
i
i
ir
s)1n(
)xx(
n
1
h
whereh1ss i



 
A Partial list of
Standard residuals
ObservationPredicted Price Residuals Standard Residuals
1 14736.91 -100.91 -0.33
2 14277.65 -155.65 -0.52
3 14210.66 -194.66 -0.65
4 15143.59 446.41 1.48
5 15091.05 476.95 1.58
Standardized residual ‘i’ =
Residual ‘i’
Standard deviation
Residual Analysis
18
Standardized residuals
0
10
20
30
40
-2 -1 0 1 2 More
It seems the residual are normally distributed with mean zero
Residual Analysis
19
Heteroscedasticity
 When the requirement of a constant variance is violated we have a
condition of heteroscedasticity.
 Diagnose heteroscedasticity by plotting the residual against the
predicted y.
+ + +
+
+ +
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
The spread increases with y^
y^
Residual
^y
+
+
+
+
+
+
+
+
+
+
+
++
+
+
+
+
+
+
+
+
+
+
20
Homoscedasticity
 When the requirement of a constant variance is not violated we
have a condition of homoscedasticity.
 Example - continued
-1000
-500
0
500
1000
13500 14000 14500 15000 15500 16000
Predicted Price
Residuals
21
Non Independence of Error Variables
 A time series is constituted if data were collected over time.
 Examining the residuals over time, no pattern should be
observed if the errors are independent.
 When a pattern is detected, the errors are said to be
autocorrelated.
 Autocorrelation can be detected by graphing the residuals
against time.
22
Patterns in the appearance of the residuals over time indicates
that autocorrelation exists.
+
+
+
+ +
+
+
+
+
+
+
+
+ + +
+
+
+
+
+
+
+
+
+
+
Time
Residual Residual
Time
+
+
+
Note the runs of positive residuals,
replaced by runs of negative residuals
Note the oscillating behavior of the
residuals around zero.
0 0
Non Independence of Error
Variables
23
Outliers
 An outlier is an observation that is unusually small or large.
 Several possibilities need to be investigated when an outlier
is observed:
 There was an error in recording the value.
 The point does not belong in the sample.
 The observation is valid.
 Identify outliers from the scatter diagram.
 It is customary to suspect an observation is an outlier if its
|standard residual| > 2
Regression Using SPSS
Sequence of Entering Variables
Which Variables
to Enter First
The one which is theoretically more important
If Variables are
Uncorrelated
The sequence of entering variable does not
have any effect
But
Real life has more of the correlated than the
uncorrelated
Some Methods
Hierarchical
Forced/Enter
Stepwise
First Known then unknown
All together, the only method for
testing theory
The order is selected
mathematically by software
Stepwise
Methods
Forward Backward
Process
Start with the constant and
then add the one with the
highest variation explained
Start with the all and then
remove the one with the
least significance
Suppression
Effect
It suppresses No suppression
Suppression effect means that a variable has significant effect only when other
variables are held constant. Forward is more prone to exclude the variable
because of suppression effect.
Cross Validation
When stepwise methods are used, the sample is advised to be divided into two
groups; one is used to develop the model and the other is used to test it.
AccuracyofRegressionModel
DiagnosticsAssumptions
Outliers &
Residuals
Influential
Cases
Variable Type
Variance Positive
No Perfect Multicolinearity
Homoscedasticity
Independent Errors
Predictors are uncorrelated with external
variables
Diagnostics Outliers
Outlier
Outlier Effect
How to Identify
Residuals
Diagnostics Outliers
Unstandardized
Residuals
Standardized
Residuals (SR)
There is outlier if
SR > 3.29
More than 1% Sample cases have
SR > 2.58
More than 5% Sample cases have
SR > 1.96
Student zed
Residuals
Unstandardized Residual divided by
Changing Standard deviation
Diagnostics Influential Cases
Influential Case
Measuring the
Effect on Case
Undue influence on coefficient
Adjusted Predicted
Value ( APV)
DFFIT
Deleted Residuals
Studentized
Deleted Residuals
Predicted value when that
particular case is excluded while
developing the model
APV– Original PV
APV– Original OV
Deleted Residuals / Std Dev
Cook’s Distance
Leverage
(K+1)/n
K =Predictors
n = sample size
Mahalanobis
Distance
CD >1
influence of the observed
value of the outcome
variable over the predicted
values. (0 to 1)
Effect On Model
Effect on
Model
Values Cause for
Concern
Distance of cases from the
mean(s) of the predictor
variable(s).
L >2(K+1)/n
L >3(K+1)/n
N = 500, 5 above 25
N = 100, 3 above 15
Use Barnett & Lewis
Table
Assumptions
Variable Type
Variance Positive
No Perfect
Multicolinearity
Homoscedasticity
Independent
Errors
Predictors are
uncorrelated with
external variables
Quantitative or Categorical
Variance > 0
Predictor Variables should not correlate
highly
Variance of the residual
terms should be constant
Multicolinearity
Perfect Colinearity
Perfect collinearity exists when at
least one predictor is a perfect
linear combination of
the others (the simplest example being
two predictors that are perfectly
correlated – they
have a correlation coefficient of 1).

More Related Content

What's hot

What's hot (20)

Chapter 14
Chapter 14 Chapter 14
Chapter 14
 
Applied statistics part 1
Applied statistics part 1Applied statistics part 1
Applied statistics part 1
 
Statr session 17 and 18 (ASTR)
Statr session 17 and 18 (ASTR)Statr session 17 and 18 (ASTR)
Statr session 17 and 18 (ASTR)
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
 
Correlation and regression
Correlation and regressionCorrelation and regression
Correlation and regression
 
Z test
Z testZ test
Z test
 
What is a Single Sample Z Test?
What is a Single Sample Z Test?What is a Single Sample Z Test?
What is a Single Sample Z Test?
 
Data Science - Part IV - Regression Analysis & ANOVA
Data Science - Part IV - Regression Analysis & ANOVAData Science - Part IV - Regression Analysis & ANOVA
Data Science - Part IV - Regression Analysis & ANOVA
 
Single sample z test - explain (final)
Single sample z test - explain (final)Single sample z test - explain (final)
Single sample z test - explain (final)
 
Multiple Regression Analysis (MRA)
Multiple Regression Analysis (MRA)Multiple Regression Analysis (MRA)
Multiple Regression Analysis (MRA)
 
Calculating a single sample z test by hand
Calculating a single sample z test by handCalculating a single sample z test by hand
Calculating a single sample z test by hand
 
Regression Analysis
Regression AnalysisRegression Analysis
Regression Analysis
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
 
Basic probability theory and statistics
Basic probability theory and statisticsBasic probability theory and statistics
Basic probability theory and statistics
 
Chi square and t tests, Neelam zafar & group
Chi square and t tests, Neelam zafar & groupChi square and t tests, Neelam zafar & group
Chi square and t tests, Neelam zafar & group
 
Regression analysis.
Regression analysis.Regression analysis.
Regression analysis.
 
Review Z Test Ci 1
Review Z Test Ci 1Review Z Test Ci 1
Review Z Test Ci 1
 
Regression presentation
Regression presentationRegression presentation
Regression presentation
 
Multiple linear regression
Multiple linear regressionMultiple linear regression
Multiple linear regression
 
Regression analysis made easy
Regression analysis made easyRegression analysis made easy
Regression analysis made easy
 

Similar to Corrleation and regression

Distribution of EstimatesLinear Regression ModelAssume (yt,.docx
Distribution of EstimatesLinear Regression ModelAssume (yt,.docxDistribution of EstimatesLinear Regression ModelAssume (yt,.docx
Distribution of EstimatesLinear Regression ModelAssume (yt,.docxmadlynplamondon
 
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic NetsData Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic NetsDerek Kane
 
linear model multiple predictors.pdf
linear model multiple predictors.pdflinear model multiple predictors.pdf
linear model multiple predictors.pdfssuser7d5314
 
Estimating a Population Standard Deviation or Variance
Estimating a Population Standard Deviation or VarianceEstimating a Population Standard Deviation or Variance
Estimating a Population Standard Deviation or VarianceLong Beach City College
 
Estimating a Population Standard Deviation or Variance
Estimating a Population Standard Deviation or Variance Estimating a Population Standard Deviation or Variance
Estimating a Population Standard Deviation or Variance Long Beach City College
 
Quantitative Methods for Lawyers - Class #20 - Regression Analysis - Part 3
Quantitative Methods for Lawyers - Class #20 - Regression Analysis - Part 3Quantitative Methods for Lawyers - Class #20 - Regression Analysis - Part 3
Quantitative Methods for Lawyers - Class #20 - Regression Analysis - Part 3Daniel Katz
 
regression-linearandlogisitics-220524024037-4221a176 (1).pdf
regression-linearandlogisitics-220524024037-4221a176 (1).pdfregression-linearandlogisitics-220524024037-4221a176 (1).pdf
regression-linearandlogisitics-220524024037-4221a176 (1).pdflisow86669
 
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?Smarten Augmented Analytics
 
Simple lin regress_inference
Simple lin regress_inferenceSimple lin regress_inference
Simple lin regress_inferenceKemal İnciroğlu
 
7. logistics regression using spss
7. logistics regression using spss7. logistics regression using spss
7. logistics regression using spssDr Nisha Arora
 
SLR Assumptions:Model Check Using SPSS
SLR Assumptions:Model Check Using SPSSSLR Assumptions:Model Check Using SPSS
SLR Assumptions:Model Check Using SPSSNermin Osman
 
Advanced Econometrics L5-6.pptx
Advanced Econometrics L5-6.pptxAdvanced Econometrics L5-6.pptx
Advanced Econometrics L5-6.pptxakashayosha
 
Big Data Analysis
Big Data AnalysisBig Data Analysis
Big Data AnalysisNBER
 

Similar to Corrleation and regression (20)

Regression for class teaching
Regression for class teachingRegression for class teaching
Regression for class teaching
 
Distribution of EstimatesLinear Regression ModelAssume (yt,.docx
Distribution of EstimatesLinear Regression ModelAssume (yt,.docxDistribution of EstimatesLinear Regression ModelAssume (yt,.docx
Distribution of EstimatesLinear Regression ModelAssume (yt,.docx
 
Errors2
Errors2Errors2
Errors2
 
Simple Linear Regression.pptx
Simple Linear Regression.pptxSimple Linear Regression.pptx
Simple Linear Regression.pptx
 
Simple egression.pptx
Simple egression.pptxSimple egression.pptx
Simple egression.pptx
 
Lecture 4
Lecture 4Lecture 4
Lecture 4
 
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic NetsData Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
 
linear model multiple predictors.pdf
linear model multiple predictors.pdflinear model multiple predictors.pdf
linear model multiple predictors.pdf
 
Estimating a Population Standard Deviation or Variance
Estimating a Population Standard Deviation or VarianceEstimating a Population Standard Deviation or Variance
Estimating a Population Standard Deviation or Variance
 
Estimating a Population Standard Deviation or Variance
Estimating a Population Standard Deviation or Variance Estimating a Population Standard Deviation or Variance
Estimating a Population Standard Deviation or Variance
 
Quantitative Methods for Lawyers - Class #20 - Regression Analysis - Part 3
Quantitative Methods for Lawyers - Class #20 - Regression Analysis - Part 3Quantitative Methods for Lawyers - Class #20 - Regression Analysis - Part 3
Quantitative Methods for Lawyers - Class #20 - Regression Analysis - Part 3
 
regression-linearandlogisitics-220524024037-4221a176 (1).pdf
regression-linearandlogisitics-220524024037-4221a176 (1).pdfregression-linearandlogisitics-220524024037-4221a176 (1).pdf
regression-linearandlogisitics-220524024037-4221a176 (1).pdf
 
Linear and Logistics Regression
Linear and Logistics RegressionLinear and Logistics Regression
Linear and Logistics Regression
 
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
 
Simple lin regress_inference
Simple lin regress_inferenceSimple lin regress_inference
Simple lin regress_inference
 
7. logistics regression using spss
7. logistics regression using spss7. logistics regression using spss
7. logistics regression using spss
 
SLR Assumptions:Model Check Using SPSS
SLR Assumptions:Model Check Using SPSSSLR Assumptions:Model Check Using SPSS
SLR Assumptions:Model Check Using SPSS
 
Regression
RegressionRegression
Regression
 
Advanced Econometrics L5-6.pptx
Advanced Econometrics L5-6.pptxAdvanced Econometrics L5-6.pptx
Advanced Econometrics L5-6.pptx
 
Big Data Analysis
Big Data AnalysisBig Data Analysis
Big Data Analysis
 

More from Pakistan Gum Industries Pvt. Ltd (20)

Transportation management
Transportation  managementTransportation  management
Transportation management
 
Anum alam initial pages. 090
Anum alam initial pages. 090Anum alam initial pages. 090
Anum alam initial pages. 090
 
Airlineres
AirlineresAirlineres
Airlineres
 
Farehalet
FarehaletFarehalet
Farehalet
 
Cv ali final
Cv ali finalCv ali final
Cv ali final
 
Ali hasan
Ali hasanAli hasan
Ali hasan
 
(Resume) tariq pervez
(Resume) tariq pervez(Resume) tariq pervez
(Resume) tariq pervez
 
Graded businessvocabularylist
Graded businessvocabularylistGraded businessvocabularylist
Graded businessvocabularylist
 
Vacation accrued
Vacation accruedVacation accrued
Vacation accrued
 
Sick time
Sick timeSick time
Sick time
 
Blank employee letter
Blank employee letterBlank employee letter
Blank employee letter
 
Mobile advertising final
Mobile advertising finalMobile advertising final
Mobile advertising final
 
Introduction
IntroductionIntroduction
Introduction
 
Final iran
Final iranFinal iran
Final iran
 
Saudi arabia
Saudi arabiaSaudi arabia
Saudi arabia
 
The united nations security council
The united nations security councilThe united nations security council
The united nations security council
 
Presentation 6
Presentation 6Presentation 6
Presentation 6
 
Paper saad niazi
Paper saad niaziPaper saad niazi
Paper saad niazi
 
History of e bay in china
History of e bay in chinaHistory of e bay in china
History of e bay in china
 
Case 1
Case 1Case 1
Case 1
 

Recently uploaded

MS4 level being good citizen -imperative- (1) (1).pdf
MS4 level   being good citizen -imperative- (1) (1).pdfMS4 level   being good citizen -imperative- (1) (1).pdf
MS4 level being good citizen -imperative- (1) (1).pdfMr Bounab Samir
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
 
4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptxmary850239
 
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxGrade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxkarenfajardo43
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
Mental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young mindsMental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young mindsPooky Knightsmith
 
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxDIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxMichelleTuguinay1
 
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQ-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQuiz Club NITW
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfVanessa Camilleri
 
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxBIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxSayali Powar
 
How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17Celine George
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxlancelewisportillo
 
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...DhatriParmar
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfPatidar M
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Projectjordimapav
 
week 1 cookery 8 fourth - quarter .pptx
week 1 cookery 8  fourth  -  quarter .pptxweek 1 cookery 8  fourth  -  quarter .pptx
week 1 cookery 8 fourth - quarter .pptxJonalynLegaspi2
 

Recently uploaded (20)

MS4 level being good citizen -imperative- (1) (1).pdf
MS4 level   being good citizen -imperative- (1) (1).pdfMS4 level   being good citizen -imperative- (1) (1).pdf
MS4 level being good citizen -imperative- (1) (1).pdf
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
 
4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx
 
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxGrade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
Mental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young mindsMental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young minds
 
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxDIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
 
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQ-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdf
 
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptxINCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
 
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxBIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
 
Faculty Profile prashantha K EEE dept Sri Sairam college of Engineering
Faculty Profile prashantha K EEE dept Sri Sairam college of EngineeringFaculty Profile prashantha K EEE dept Sri Sairam college of Engineering
Faculty Profile prashantha K EEE dept Sri Sairam college of Engineering
 
prashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Professionprashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Profession
 
How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
 
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdf
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Project
 
week 1 cookery 8 fourth - quarter .pptx
week 1 cookery 8  fourth  -  quarter .pptxweek 1 cookery 8  fourth  -  quarter .pptx
week 1 cookery 8 fourth - quarter .pptx
 
Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"
 

Corrleation and regression

  • 1. Aaker, Kumar, Day (5th Edition) Correlation & Regression
  • 3. Regression What is Regression A Statistical Technique that is used to relate two or more variables. Use the independent variable(s) to predict the value of dependent variable. Objective Example For a given value of advertisement expenditure, how much sales will be generated. With a given diet plan, how much weight an individual will be able to reduce. With a given diet plan, how much weight an individual will be able to reduce.
  • 4. Regression Understanding A layman Question Suppose we want to find out how much the age of the car helps you to determine the price of the car The older the car ______ will be the priceA layman Answer Regression in Simple Words As the age of the car increases by one year the price of the car is estimated to decrease by a certain amount. Y(Estimated) = b0 + b1 X Regression in Statistical Terms
  • 5. Regression Understanding Data Set: Age & Price of the Cars A Negative Relationship What Relation Do you see? Age 1 2 1 2 3 4 3 4 3 Price 90 85 93 84 80 74 81 76 79 A Convenient Way to Look (What is this tool Called?) Price Age 70 80 90 1 2 3 4
  • 6. Price Age 70 80 90 1 2 3 4 HowtoShowit Statistically Y (E) = b0 + b1 X Y (E) = 97 – 5 X Y = 97 – 5 X +E Term Y (E) X b0 b1 What it is! Dependent Variable whose behavior is to be determined Independent Variable whose effect to be determined Intercept: Value of Y(E) when X = 0 Estimated Change in Y in response to unit Change in X E Difference between the actual and estimated
  • 7. Assessing the Goodness of Fit: Graphical Way Goodness of Fit Means How well the model fits the actual data. Less residual means a good bit, more residual means bad Fit Bad Fit Good Fit Perfect Fit
  • 8. Assessing the Goodness of Fit: Statistical Way Expected Y Estimated YActual Y
  • 9. SSR SSR =Σ (Estimated – Expected)2
  • 10. SST SST =Σ (Real – Expected)2
  • 11. SSE SSE =Σ (Actual – Expected)2
  • 12. Assessing the Goodness of Fit: Statistical Way R2 SST =Σ (Real – Expected)2 SSR =Σ (Estimated – Expected)2 SSE =Σ (Actual – Expected)2 A good Model is the one in which SSE is the lowest SSE = 0 SST = SSR + SSE R2 = SSR/SST R2 = 1 - SSE/SST
  • 13. Inferring About the Population Assumptions Expected Value of Residual Variance of Residual Distribution of Residual Dependency of Residuals E(ei ) = 0 σe1= σe2= …. = σei Normal Independent What it means No apparent pattern in residual plot Residual Plot has consistent Spread Histogram is symmetric or normal (Histogram & Probability Plot of Residual) Relationship b/w IndV & DV Linear Linear Scatter Plot How to Check it
  • 14. The Three Conditions Shown Together As the distribution is symmetric, the mean distribution of error term will be zero The distribution of error term is shown to be normally distributed Variance of error term for different values of x appear to be same
  • 15. Analysis of Residuals If the assumptions of regression are met, the following two conditions are met Cond1: Plot of residuals (e) against predictor (x) should fall roughly in a horizontal band & symmetric about x-axis Cond2: A normal probability plot of the residuals should be roughly linear
  • 16. 16 Residual Analysis  Examining the residuals (or standardized residuals), help detect violations of the required conditions.  Example – continued:  Nonnormality.  Use Excel to obtain the standardized residual histogram.  Examine the histogram and look for a bell shaped. diagram with a mean close to zero.
  • 17. 17 For each residual we calculate the standard deviation as follows: 2 x 2 i i ir s)1n( )xx( n 1 h whereh1ss i      A Partial list of Standard residuals ObservationPredicted Price Residuals Standard Residuals 1 14736.91 -100.91 -0.33 2 14277.65 -155.65 -0.52 3 14210.66 -194.66 -0.65 4 15143.59 446.41 1.48 5 15091.05 476.95 1.58 Standardized residual ‘i’ = Residual ‘i’ Standard deviation Residual Analysis
  • 18. 18 Standardized residuals 0 10 20 30 40 -2 -1 0 1 2 More It seems the residual are normally distributed with mean zero Residual Analysis
  • 19. 19 Heteroscedasticity  When the requirement of a constant variance is violated we have a condition of heteroscedasticity.  Diagnose heteroscedasticity by plotting the residual against the predicted y. + + + + + + + + + + + + + + + + + + + + + + + + The spread increases with y^ y^ Residual ^y + + + + + + + + + + + ++ + + + + + + + + + +
  • 20. 20 Homoscedasticity  When the requirement of a constant variance is not violated we have a condition of homoscedasticity.  Example - continued -1000 -500 0 500 1000 13500 14000 14500 15000 15500 16000 Predicted Price Residuals
  • 21. 21 Non Independence of Error Variables  A time series is constituted if data were collected over time.  Examining the residuals over time, no pattern should be observed if the errors are independent.  When a pattern is detected, the errors are said to be autocorrelated.  Autocorrelation can be detected by graphing the residuals against time.
  • 22. 22 Patterns in the appearance of the residuals over time indicates that autocorrelation exists. + + + + + + + + + + + + + + + + + + + + + + + + + Time Residual Residual Time + + + Note the runs of positive residuals, replaced by runs of negative residuals Note the oscillating behavior of the residuals around zero. 0 0 Non Independence of Error Variables
  • 23. 23 Outliers  An outlier is an observation that is unusually small or large.  Several possibilities need to be investigated when an outlier is observed:  There was an error in recording the value.  The point does not belong in the sample.  The observation is valid.  Identify outliers from the scatter diagram.  It is customary to suspect an observation is an outlier if its |standard residual| > 2
  • 25. Sequence of Entering Variables Which Variables to Enter First The one which is theoretically more important If Variables are Uncorrelated The sequence of entering variable does not have any effect But Real life has more of the correlated than the uncorrelated Some Methods Hierarchical Forced/Enter Stepwise First Known then unknown All together, the only method for testing theory The order is selected mathematically by software
  • 26. Stepwise Methods Forward Backward Process Start with the constant and then add the one with the highest variation explained Start with the all and then remove the one with the least significance Suppression Effect It suppresses No suppression Suppression effect means that a variable has significant effect only when other variables are held constant. Forward is more prone to exclude the variable because of suppression effect. Cross Validation When stepwise methods are used, the sample is advised to be divided into two groups; one is used to develop the model and the other is used to test it.
  • 27. AccuracyofRegressionModel DiagnosticsAssumptions Outliers & Residuals Influential Cases Variable Type Variance Positive No Perfect Multicolinearity Homoscedasticity Independent Errors Predictors are uncorrelated with external variables
  • 28. Diagnostics Outliers Outlier Outlier Effect How to Identify Residuals Diagnostics Outliers Unstandardized Residuals Standardized Residuals (SR) There is outlier if SR > 3.29 More than 1% Sample cases have SR > 2.58 More than 5% Sample cases have SR > 1.96 Student zed Residuals Unstandardized Residual divided by Changing Standard deviation
  • 29.
  • 30. Diagnostics Influential Cases Influential Case Measuring the Effect on Case Undue influence on coefficient Adjusted Predicted Value ( APV) DFFIT Deleted Residuals Studentized Deleted Residuals Predicted value when that particular case is excluded while developing the model APV– Original PV APV– Original OV Deleted Residuals / Std Dev
  • 31. Cook’s Distance Leverage (K+1)/n K =Predictors n = sample size Mahalanobis Distance CD >1 influence of the observed value of the outcome variable over the predicted values. (0 to 1) Effect On Model Effect on Model Values Cause for Concern Distance of cases from the mean(s) of the predictor variable(s). L >2(K+1)/n L >3(K+1)/n N = 500, 5 above 25 N = 100, 3 above 15 Use Barnett & Lewis Table
  • 32. Assumptions Variable Type Variance Positive No Perfect Multicolinearity Homoscedasticity Independent Errors Predictors are uncorrelated with external variables Quantitative or Categorical Variance > 0 Predictor Variables should not correlate highly Variance of the residual terms should be constant
  • 33. Multicolinearity Perfect Colinearity Perfect collinearity exists when at least one predictor is a perfect linear combination of the others (the simplest example being two predictors that are perfectly correlated – they have a correlation coefficient of 1).