SlideShare a Scribd company logo
1 of 24
Download to read offline
Experimental Techniques in
Thermofluids
MEL7070
Regression Analysis
By: Dr. Shrutidhara Sarma
Regression Analysis
• Also known as curve fitting.
• Suitable plot of data to indicate the nature of the relation between the
independent and the dependent variables.
• If prediction is within given data range-interpolation, otherwise extrapolation.
• Can be linear, semi log, log-log, nonlinear.
• Tells us about the error incurred while representing the data using that relation.
The linear graph can be of the form
(i) y=ax+b –linear fit- straight line on a linear graph
(ii) y= axb - power law fit-straight line on a log-log graph
(iii) y= aebx – exponential fit-straight line on a semi log graph
The non-linear relationship follows a polynomial relationship of the form
y= ax3+bx2+cx+d.
The parameters a, b, c, d are known as the fit parameters and need to be determined
as a part of the regression analysis.
Linear between x and y
y=ax+b –linear fit- straight line on a linear graph.
Linear between logx and logy
y= axb –linear fit- power law fit-straight line on a log-log graph.
Non-linear between x and y
y= ax3+bx2+cx+d –non-linear fit- polynomial relationship.
Least Square method
Let’s consider there is a linear relation between x and y i.e. their trend is represented
by a straight line.
The straight line does not pass through any of the data points.
If we consider the straight line as a local mean then the deviations are distributed
w.r.t the local mean as a normal distribution. The least square principle can be
applied as:
Minimize
where yf
= ax + b is the desired linear fit to data.
( )
2 2
2 1 1
n n
i f i i
i i
y y y ax b
s
n n
= =
é ù
- - +
é ù
ë û
ë û
= =
å å
Least square method contd.
S2 gives the variance w.r.t. the mean. Hence, minimization requires:
These equations may be rearranged as two simultaneous equations for a and b as
given below (known as normal equations):
Let’s define:
This quantity is known as the covariance i.e. influence in variability of xi on yi and
vice versa.
( ) ( )
2 2
1 1
1 1
2 0; 2 0
n n
i i i i i
i i
s s
y ax b x y ax b
a n b n
= =
¶ ¶
= - - + = = - - + =
é ù é ù
ë û ë û
¶ ¶
å å
( ) ( )
( )
2
i i i i
i i
x a x b x y
x a nb y
+ =
+ =
å å å
å å
2 2
2 2 2 2
, , , ;
i i i i i i
x y xy
x y x y x y
x y x y xy
n n n n n
s s s
= = = - = - = -
å å å å å
Least square method contd.
With these definitions the slope and intercept of the line fit a may be written as
The regression line line passes through the point
Example:
The data is expected to follow a linear relation y=ax+b. Find the slope and intercept.
Find the correlation coefficient.
2
2
2
.
i i
xy
i
x
x y
x y
n
a
x
x
n
b y ax
s
s
-
= =
-
= -
å
å
( , )
x y
yi xi
1.2 1.0
2.0 1.6
2.4 3.4
3.5 4.0
3.5 5.2
Quickly verify
Least square method contd.
Solution:
Sum Sum
Use equations:
Answers: a= 0.54 ; b= 0.879
Hence, y= 0.54x+0.879
yi xi
1.2 1.0
2.0 1.6
2.4 3.4
3.5 4.0
3.5 5.2
12.6 15.2
xiyi xi
2
1.2 1.0
3.2 2.56
8.16 11.56
14.0 16.0
18.2 27.04
44.76 58.16
( ) ( )
( )
2
i i i i
i i
x a x b x y
x a nb y
+ =
+ =
å å å
å å
Standard error
Let’s say the computed value of y is
And suppose there are “n” number of data. Then considering a linear fit we have 2
parameters “a” and “b”.
Hence, DOF = n-p = n-2 [Two parameters a and b are calculated using the same data.]
Hence, standard error is given by:
f
y ax b
= +
[ ]
1/2
2
1
1/2
2
i
1
2
(ax b)
2
n
i f
i
n
i
i
y y
e
n
y
e
n
=
=
ì ü
é ù
-
ï ï
ë û
ï ï
= í ý
-
ï ï
ï ï
î þ
ì ü
- +
ï ï
ï ï
= í ý
-
ï ï
ï ï
î þ
å
å
Goodness of fit
• A measure of how good the regression line as a representation of the data is.
• It is possible to fit two lines to data by treating:
(a) "x" as the independent variable and "y" as the dependent variable or
(b) "y" as the independent variable and "x" as the dependent variable i.e. x=a'y+b'.
Then
The second fit line may be written as:
The slope of this line is which is not the same as “a”.
• If the two slopes are the same the two regression lines coincide.
• The ratio of the slopes of the two lines is a measure of how good the form of the
fit is to the data
2
;
xy
y
a b x a y
s
s
¢ ¢ ¢
= = -
1 b
y x
a a
¢
= -
¢ ¢
1
a¢
Correlation coefficient
The correlation coefficient ρ is defined as:
• The sign of the correlation coefficient is determined by the sign of the covariance.
• The correlation is perfect if ρ = ±1 .
• The correlation is poor if ρ ≈ 0 .
• Absolute value of the correlation coefficient should be greater than 0.5 to indicate
that y and x are related.
2
2
2 2
1
aa
2
xy
x y
xy
x y
slopeof st regressionline
slopeof nd regressionline
or
s
r
s s
s
r
s s
¢
= = =
= ±
Polynomial regression
Sometimes the data may show a non-linear behavior that may be modeled by a
polynomial relation. Ex: y
f
= ax2 + bx + c .
The variance of the data with respect to the fit is again minimized with respect to the
three fit parameters a, b, c to get three normal equations.
Least square principle requires:
( )
2
2
2 1
n
i i i
i
y ax bx c
s
n
=
é ù
- + +
ë û
=
å
( )
( )
( )
2
2 2
1
2
2
1
2
2
1
2
0;
2
0;
2
0;
n
i i i i
i
n
i i i i
i
n
i i i
i
s
y ax bx c x
a n
s
y ax bx c x
b n
s
y ax bx c
c n
=
=
=
¶
é ù
= - + + =
ë û
¶
¶
é ù
= - + + =
ë û
¶
¶
é ù
= - + + =
ë û
¶
å
å
å
Polynomial regression
The earlier equations give:
These are solved for the fit parameters.
( ) ( ) ( )
( ) ( ) ( )
( ) ( )
3 2
4 2
2
3
2
i i i i i
i i i i i
i i i
x a x b x c x y
x a x b x c x y
x a x b nc y
+ + =
+ + =
+ + =
å å å å
å å å å
å å å
Goodness of fit and the index of correlation
In the case of a non-linear fit we define a quantity
known as the index of correlation to determine the
goodness of the fit.
[ ]
2
2
2
2
1 1
f
y
y y
s
y y
r
s
é ù
-
ë û
= ± - = ± -
-
å
å
• If the index of correlation is close to ±1, the fit to be considered good.
• The index of correlation is identical to the correlation coefficient for a linear fit.
• The index of correlation compares the scatter of the data with respect to its own
mean as compared to the scatter of the data with respect to the regression curve
General index of correlation
Let’s suppose a function z is defined as :
Standard error:
LS principle:
Index of correlation:
Basically means variance w.r.t to mean compared to variance w.r.t to local mean (fit).
2
2
1
y
s
or R
r
s
= ± -
( )
( , )
z f x y
z ax by c
=
= + +
( )
( )
( )
1
1
1
2 ( ) 0;
2 ( ) 0;
2 0;
n
i i i i
i
n
i i i i
i
n
i i i
i
S
z ax by c x
a
S
z ax by c y
b
S
z ax by c
c
=
=
=
¶
= - + + - =
é ù
ë û
¶
¶
= - + + - =
é ù
ë û
¶
¶
= - + + =
é ù
ë û
¶
å
å
å
( )
2
1
n
i i i
i
S z ax by c
=
= - + +
é ù
ë û
å
Parity plot
• The data and the fit may be compared by making a parity plot.
• The parity plot is a plot of given data (z) along the abscissa and the fit (zf) along
the ordinate.
• The parity line is a line of equality between the two.
• The departure of the data from the parity line is an indication of the quality of the
fit. When the data is a function of more than one independent variable it is not
always possible to make plots between independent and dependent variables. In
such a case the parity plot is a way out.
General non-linear fit:
What if the fit equation is a non-linear relation that is neither a polynomial nor can be
reducible to the linear form?
Example:
Here, parameter estimation requires the use of a search method to determine the best
parameter set that minimizes the sum of the squares of the residual. i.e. to find (a,
b,….p) such that S is minimized for yf = f (x : a, b, c….p ) – [general non linear
function with p parameters].
Where sum of the squares of the residual given by
Hence, choose the parameters such that
In general it is not possible to set the partial derivatives with respect to the parameters
to zero to obtain the normal equations and thus obtain the fit parameters.
2
b( ln )
(1) (2)
bx x c x d
y ae cx d or y ae + +
= + + =
2
1
(min)
N
i f
i
S y y
=
é ù
= -
ë û
å
.... 0
S S S S
a b c p
¶ ¶ ¶ ¶
= = = =
¶ ¶ ¶ ¶
General non-linear fit:
Let’s consider a 3 parameter system with a, b, c as the parameter.
Now assume certain values which gives some value that may not be
minimum.
Then evaluate
If each of these is zero, then it’s a minimum.
Now S being a function of the parameters,
We can write and minimum is achieved when
Magnitude of gradient
Unit vectors in the direction of components:
(0) (0) (0)
, ,c
a b (0)
S
(0) (0) (0) (0) (0) (0) (0) (0) (0)
, , , , , ,
a b c a b c a b c
S S S
a b c
¶ ¶ ¶
= =
¶ ¶ ¶
( , , )
S f a b c
=
ˆ ˆ
ˆ
. .b .c
S S S
S a
a b c
¶ ¶ ¶
Ñ = + +
¶ ¶ ¶
0
S
Ñ =
2 2 2
S S S
S
a b c
¶ ¶ ¶
æ ö æ ö æ ö
Ñ = + +
ç ÷ ç ÷ ç ÷
¶ ¶ ¶
è ø è ø è ø
, ,
S S S
a b c
S S S
¶ ¶ ¶
¶ ¶ ¶
Ñ Ñ Ñ
General non-linear fit:
To minimize we now move in a direction opposite to the gradient hence reducing the
parameter values by
Thus Note: will be same for all
Hence,
This is repeated until S reaches a value which is minimum.
Since it moves along the steepest path, hence known as Steepest Descent method.
NOTE: Initially you may chose larger values for but once it moves close to
minimum then you must reduce its magnitude.
d
(1) (0) (0)
(1) (0) (0)
(1) (0) (0)
( )
( )
c ( )
a a component along a
b b component along b
c component along c
d
d
d
= -
= -
= -
(0) (0) (0) (0) (0) (0) (0) (0) (0)
(0) (0) (0) (0) (0) (0) (0) (0) (0)
,b ,c ,b ,c ,b ,c
,b ,c ,b ,c ,b ,c
(1) (0) (1) (0) (1) (0)
; ; c
a a a
a a a
S S S
a b c
a a b b c
S S S
d d d
¶ ¶ ¶
¶ ¶ ¶
= - = - = -
Ñ Ñ Ñ
d
e
d
Example : Steepest Descent
Q. Determine the general fit parameters by general non-linear regression if
the data follows the form
X 0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8
y 1.196 1.379 1.581 1.79 2.013 2.279 2.545 2.842 3.173 3.5
bx
f
y ae cx
= +
Solution
Sum of squares of residuals:
Hence,
Assume we get
10 2
1
( )
i
bx
i i
i
S y ae cx
=
é ù
= - +
ë û
å
10
1
10
1
10
1
2 ( ) ( );
2 ( ) ( );
2 ( ) ( )
i i
i i
i
bx bx
i i
i
bx bx
i i i
i
bx
i i i
i
S
y ae cx e
a
S
y ae cx ax e
b
S
y ae cx x
c
=
=
=
¶
é ù
= - + -
ë û
¶
¶
é ù
= - + -
ë û
¶
¶
é ù
= - + -
ë û
¶
å
å
å
11.674;
24.023;
30.682;
23.003
S
S
a
S
b
S
c
=
¶
= -
¶
¶
= -
¶
¶
= -
¶
(0)
(0)
(0)
1;
0.2;
c 0.1
a
b
=
=
=
Magnitude of gradient vector:
Hence unit vector in the direction of components:
Hence,
2 2 2
45.251
S S S
S
a b c
¶ ¶ ¶
æ ö æ ö æ ö
Ñ = + + =
ç ÷ ç ÷ ç ÷
¶ ¶ ¶
è ø è ø è ø
(0) (0) (0)
(0) (0) (0)
,b ,c
,b ,c 24.023
0.531
45.249
a
a
S
a
S
¶
-
¶
= = -
Ñ
30.681
0.678;
45.249
23.003
0.508
45.249
S
b
S
S
c
S
¶
-
¶ = = -
Ñ
¶
-
¶ = = -
Ñ
(1) (0)
(1) (0)
(1) (0)
ˆ
(a) 1 (0.02 0.531) 1.011
ˆ
(b) 0.02 (0.02 0.678) 0.214
ˆ
c ( ) 0.1 (0.02 0.508) 0.11
a a
b b
c c
d
d
d
= - = - ´- =
= - = - ´- =
= - = - ´- =
Now for these values of the new value of
S=10.948
This is repeated until the value of S comes below 0.01 or less (which is specified
already).
For this example calculate the final values of a, b and c with a MATLAB program.
#Assignment
(1) (1) (1)
1.011; 0.214;c 0.11
a b
= = =

More Related Content

Similar to Regression Analysis.pdf

Similar to Regression Analysis.pdf (20)

Correlation by Neeraj Bhandari ( Surkhet.Nepal )
Correlation by Neeraj Bhandari ( Surkhet.Nepal )Correlation by Neeraj Bhandari ( Surkhet.Nepal )
Correlation by Neeraj Bhandari ( Surkhet.Nepal )
 
regression
regressionregression
regression
 
Nonparametric approach to multiple regression
Nonparametric approach to multiple regressionNonparametric approach to multiple regression
Nonparametric approach to multiple regression
 
Chapter05
Chapter05Chapter05
Chapter05
 
슬로우캠퍼스: scikit-learn & 머신러닝 (강박사)
슬로우캠퍼스:  scikit-learn & 머신러닝 (강박사)슬로우캠퍼스:  scikit-learn & 머신러닝 (강박사)
슬로우캠퍼스: scikit-learn & 머신러닝 (강박사)
 
Determinants
DeterminantsDeterminants
Determinants
 
ML Module 3.pdf
ML Module 3.pdfML Module 3.pdf
ML Module 3.pdf
 
Regression.pptx
Regression.pptxRegression.pptx
Regression.pptx
 
Regression.pptx
Regression.pptxRegression.pptx
Regression.pptx
 
Regression
RegressionRegression
Regression
 
Es272 ch5a
Es272 ch5aEs272 ch5a
Es272 ch5a
 
5 regression
5 regression5 regression
5 regression
 
Estimation rs
Estimation rsEstimation rs
Estimation rs
 
Least square method
Least square methodLeast square method
Least square method
 
Applied numerical methods lec8
Applied numerical methods lec8Applied numerical methods lec8
Applied numerical methods lec8
 
ISI MSQE Entrance Question Paper (2008)
ISI MSQE Entrance Question Paper (2008)ISI MSQE Entrance Question Paper (2008)
ISI MSQE Entrance Question Paper (2008)
 
Regression
Regression  Regression
Regression
 
Curve fitting
Curve fittingCurve fitting
Curve fitting
 
Curve fitting
Curve fittingCurve fitting
Curve fitting
 
Chapter 14 Part I
Chapter 14 Part IChapter 14 Part I
Chapter 14 Part I
 

Recently uploaded

Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝soniya singh
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfAsst.prof M.Gokilavani
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerAnamika Sarkar
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...srsj9000
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx959SahilShah
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learningmisbanausheenparvam
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxPoojaBan
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxDeepakSakkari2
 

Recently uploaded (20)

Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learning
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptx
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptx
 

Regression Analysis.pdf

  • 2. Regression Analysis • Also known as curve fitting. • Suitable plot of data to indicate the nature of the relation between the independent and the dependent variables. • If prediction is within given data range-interpolation, otherwise extrapolation. • Can be linear, semi log, log-log, nonlinear. • Tells us about the error incurred while representing the data using that relation. The linear graph can be of the form (i) y=ax+b –linear fit- straight line on a linear graph (ii) y= axb - power law fit-straight line on a log-log graph (iii) y= aebx – exponential fit-straight line on a semi log graph The non-linear relationship follows a polynomial relationship of the form y= ax3+bx2+cx+d. The parameters a, b, c, d are known as the fit parameters and need to be determined as a part of the regression analysis.
  • 3. Linear between x and y y=ax+b –linear fit- straight line on a linear graph.
  • 4. Linear between logx and logy y= axb –linear fit- power law fit-straight line on a log-log graph.
  • 5. Non-linear between x and y y= ax3+bx2+cx+d –non-linear fit- polynomial relationship.
  • 6. Least Square method Let’s consider there is a linear relation between x and y i.e. their trend is represented by a straight line. The straight line does not pass through any of the data points. If we consider the straight line as a local mean then the deviations are distributed w.r.t the local mean as a normal distribution. The least square principle can be applied as: Minimize where yf = ax + b is the desired linear fit to data. ( ) 2 2 2 1 1 n n i f i i i i y y y ax b s n n = = é ù - - + é ù ë û ë û = = å å
  • 7. Least square method contd. S2 gives the variance w.r.t. the mean. Hence, minimization requires: These equations may be rearranged as two simultaneous equations for a and b as given below (known as normal equations): Let’s define: This quantity is known as the covariance i.e. influence in variability of xi on yi and vice versa. ( ) ( ) 2 2 1 1 1 1 2 0; 2 0 n n i i i i i i i s s y ax b x y ax b a n b n = = ¶ ¶ = - - + = = - - + = é ù é ù ë û ë û ¶ ¶ å å ( ) ( ) ( ) 2 i i i i i i x a x b x y x a nb y + = + = å å å å å 2 2 2 2 2 2 , , , ; i i i i i i x y xy x y x y x y x y x y xy n n n n n s s s = = = - = - = - å å å å å
  • 8. Least square method contd. With these definitions the slope and intercept of the line fit a may be written as The regression line line passes through the point Example: The data is expected to follow a linear relation y=ax+b. Find the slope and intercept. Find the correlation coefficient. 2 2 2 . i i xy i x x y x y n a x x n b y ax s s - = = - = - å å ( , ) x y yi xi 1.2 1.0 2.0 1.6 2.4 3.4 3.5 4.0 3.5 5.2 Quickly verify
  • 9. Least square method contd. Solution: Sum Sum Use equations: Answers: a= 0.54 ; b= 0.879 Hence, y= 0.54x+0.879 yi xi 1.2 1.0 2.0 1.6 2.4 3.4 3.5 4.0 3.5 5.2 12.6 15.2 xiyi xi 2 1.2 1.0 3.2 2.56 8.16 11.56 14.0 16.0 18.2 27.04 44.76 58.16 ( ) ( ) ( ) 2 i i i i i i x a x b x y x a nb y + = + = å å å å å
  • 10. Standard error Let’s say the computed value of y is And suppose there are “n” number of data. Then considering a linear fit we have 2 parameters “a” and “b”. Hence, DOF = n-p = n-2 [Two parameters a and b are calculated using the same data.] Hence, standard error is given by: f y ax b = + [ ] 1/2 2 1 1/2 2 i 1 2 (ax b) 2 n i f i n i i y y e n y e n = = ì ü é ù - ï ï ë û ï ï = í ý - ï ï ï ï î þ ì ü - + ï ï ï ï = í ý - ï ï ï ï î þ å å
  • 11. Goodness of fit • A measure of how good the regression line as a representation of the data is. • It is possible to fit two lines to data by treating: (a) "x" as the independent variable and "y" as the dependent variable or (b) "y" as the independent variable and "x" as the dependent variable i.e. x=a'y+b'. Then The second fit line may be written as: The slope of this line is which is not the same as “a”. • If the two slopes are the same the two regression lines coincide. • The ratio of the slopes of the two lines is a measure of how good the form of the fit is to the data 2 ; xy y a b x a y s s ¢ ¢ ¢ = = - 1 b y x a a ¢ = - ¢ ¢ 1 a¢
  • 12. Correlation coefficient The correlation coefficient ρ is defined as: • The sign of the correlation coefficient is determined by the sign of the covariance. • The correlation is perfect if ρ = ±1 . • The correlation is poor if ρ ≈ 0 . • Absolute value of the correlation coefficient should be greater than 0.5 to indicate that y and x are related. 2 2 2 2 1 aa 2 xy x y xy x y slopeof st regressionline slopeof nd regressionline or s r s s s r s s ¢ = = = = ±
  • 13. Polynomial regression Sometimes the data may show a non-linear behavior that may be modeled by a polynomial relation. Ex: y f = ax2 + bx + c . The variance of the data with respect to the fit is again minimized with respect to the three fit parameters a, b, c to get three normal equations. Least square principle requires: ( ) 2 2 2 1 n i i i i y ax bx c s n = é ù - + + ë û = å ( ) ( ) ( ) 2 2 2 1 2 2 1 2 2 1 2 0; 2 0; 2 0; n i i i i i n i i i i i n i i i i s y ax bx c x a n s y ax bx c x b n s y ax bx c c n = = = ¶ é ù = - + + = ë û ¶ ¶ é ù = - + + = ë û ¶ ¶ é ù = - + + = ë û ¶ å å å
  • 14. Polynomial regression The earlier equations give: These are solved for the fit parameters. ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) 3 2 4 2 2 3 2 i i i i i i i i i i i i i x a x b x c x y x a x b x c x y x a x b nc y + + = + + = + + = å å å å å å å å å å å
  • 15. Goodness of fit and the index of correlation In the case of a non-linear fit we define a quantity known as the index of correlation to determine the goodness of the fit. [ ] 2 2 2 2 1 1 f y y y s y y r s é ù - ë û = ± - = ± - - å å • If the index of correlation is close to ±1, the fit to be considered good. • The index of correlation is identical to the correlation coefficient for a linear fit. • The index of correlation compares the scatter of the data with respect to its own mean as compared to the scatter of the data with respect to the regression curve
  • 16. General index of correlation Let’s suppose a function z is defined as : Standard error: LS principle: Index of correlation: Basically means variance w.r.t to mean compared to variance w.r.t to local mean (fit). 2 2 1 y s or R r s = ± - ( ) ( , ) z f x y z ax by c = = + + ( ) ( ) ( ) 1 1 1 2 ( ) 0; 2 ( ) 0; 2 0; n i i i i i n i i i i i n i i i i S z ax by c x a S z ax by c y b S z ax by c c = = = ¶ = - + + - = é ù ë û ¶ ¶ = - + + - = é ù ë û ¶ ¶ = - + + = é ù ë û ¶ å å å ( ) 2 1 n i i i i S z ax by c = = - + + é ù ë û å
  • 17. Parity plot • The data and the fit may be compared by making a parity plot. • The parity plot is a plot of given data (z) along the abscissa and the fit (zf) along the ordinate. • The parity line is a line of equality between the two. • The departure of the data from the parity line is an indication of the quality of the fit. When the data is a function of more than one independent variable it is not always possible to make plots between independent and dependent variables. In such a case the parity plot is a way out.
  • 18. General non-linear fit: What if the fit equation is a non-linear relation that is neither a polynomial nor can be reducible to the linear form? Example: Here, parameter estimation requires the use of a search method to determine the best parameter set that minimizes the sum of the squares of the residual. i.e. to find (a, b,….p) such that S is minimized for yf = f (x : a, b, c….p ) – [general non linear function with p parameters]. Where sum of the squares of the residual given by Hence, choose the parameters such that In general it is not possible to set the partial derivatives with respect to the parameters to zero to obtain the normal equations and thus obtain the fit parameters. 2 b( ln ) (1) (2) bx x c x d y ae cx d or y ae + + = + + = 2 1 (min) N i f i S y y = é ù = - ë û å .... 0 S S S S a b c p ¶ ¶ ¶ ¶ = = = = ¶ ¶ ¶ ¶
  • 19. General non-linear fit: Let’s consider a 3 parameter system with a, b, c as the parameter. Now assume certain values which gives some value that may not be minimum. Then evaluate If each of these is zero, then it’s a minimum. Now S being a function of the parameters, We can write and minimum is achieved when Magnitude of gradient Unit vectors in the direction of components: (0) (0) (0) , ,c a b (0) S (0) (0) (0) (0) (0) (0) (0) (0) (0) , , , , , , a b c a b c a b c S S S a b c ¶ ¶ ¶ = = ¶ ¶ ¶ ( , , ) S f a b c = ˆ ˆ ˆ . .b .c S S S S a a b c ¶ ¶ ¶ Ñ = + + ¶ ¶ ¶ 0 S Ñ = 2 2 2 S S S S a b c ¶ ¶ ¶ æ ö æ ö æ ö Ñ = + + ç ÷ ç ÷ ç ÷ ¶ ¶ ¶ è ø è ø è ø , , S S S a b c S S S ¶ ¶ ¶ ¶ ¶ ¶ Ñ Ñ Ñ
  • 20. General non-linear fit: To minimize we now move in a direction opposite to the gradient hence reducing the parameter values by Thus Note: will be same for all Hence, This is repeated until S reaches a value which is minimum. Since it moves along the steepest path, hence known as Steepest Descent method. NOTE: Initially you may chose larger values for but once it moves close to minimum then you must reduce its magnitude. d (1) (0) (0) (1) (0) (0) (1) (0) (0) ( ) ( ) c ( ) a a component along a b b component along b c component along c d d d = - = - = - (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) ,b ,c ,b ,c ,b ,c ,b ,c ,b ,c ,b ,c (1) (0) (1) (0) (1) (0) ; ; c a a a a a a S S S a b c a a b b c S S S d d d ¶ ¶ ¶ ¶ ¶ ¶ = - = - = - Ñ Ñ Ñ d e d
  • 21. Example : Steepest Descent Q. Determine the general fit parameters by general non-linear regression if the data follows the form X 0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 y 1.196 1.379 1.581 1.79 2.013 2.279 2.545 2.842 3.173 3.5 bx f y ae cx = +
  • 22. Solution Sum of squares of residuals: Hence, Assume we get 10 2 1 ( ) i bx i i i S y ae cx = é ù = - + ë û å 10 1 10 1 10 1 2 ( ) ( ); 2 ( ) ( ); 2 ( ) ( ) i i i i i bx bx i i i bx bx i i i i bx i i i i S y ae cx e a S y ae cx ax e b S y ae cx x c = = = ¶ é ù = - + - ë û ¶ ¶ é ù = - + - ë û ¶ ¶ é ù = - + - ë û ¶ å å å 11.674; 24.023; 30.682; 23.003 S S a S b S c = ¶ = - ¶ ¶ = - ¶ ¶ = - ¶ (0) (0) (0) 1; 0.2; c 0.1 a b = = =
  • 23. Magnitude of gradient vector: Hence unit vector in the direction of components: Hence, 2 2 2 45.251 S S S S a b c ¶ ¶ ¶ æ ö æ ö æ ö Ñ = + + = ç ÷ ç ÷ ç ÷ ¶ ¶ ¶ è ø è ø è ø (0) (0) (0) (0) (0) (0) ,b ,c ,b ,c 24.023 0.531 45.249 a a S a S ¶ - ¶ = = - Ñ 30.681 0.678; 45.249 23.003 0.508 45.249 S b S S c S ¶ - ¶ = = - Ñ ¶ - ¶ = = - Ñ (1) (0) (1) (0) (1) (0) ˆ (a) 1 (0.02 0.531) 1.011 ˆ (b) 0.02 (0.02 0.678) 0.214 ˆ c ( ) 0.1 (0.02 0.508) 0.11 a a b b c c d d d = - = - ´- = = - = - ´- = = - = - ´- =
  • 24. Now for these values of the new value of S=10.948 This is repeated until the value of S comes below 0.01 or less (which is specified already). For this example calculate the final values of a, b and c with a MATLAB program. #Assignment (1) (1) (1) 1.011; 0.214;c 0.11 a b = = =