SlideShare a Scribd company logo
1 of 35
CS 472 - Regression 1
Regression
 For classification the output(s) is nominal
 In regression the output is continuous
– Function Approximation
 Many models could be used – Simplest is linear regression
– Fit data with the best hyper-plane which "goes through" the points
y
dependent
variable
(output)
x – independent variable (input)
CS 472 - Regression 2
Regression
 For classification the output(s) is nominal
 In regression the output is continuous
– Function Approximation
 Many models could be used – Simplest is linear regression
– Fit data with the best hyper-plane which "goes through" the points
y
dependent
variable
(output)
x – independent variable (input)
CS 472 - Regression 3
Regression
 For classification the output(s) is nominal
 In regression the output is continuous
– Function Approximation
 Many models could be used – Simplest is linear regression
– Fit data with the best hyper-plane which "goes through" the points
– For each point the difference between the predicted point and the
actual observation is the residue
y
x
Simple Linear Regression
 For now, assume just one (input) independent variable x,
and one (output) dependent variable y
– Multiple linear regression assumes an input vector x
– Multivariate linear regression assumes an output vector y
 We "fit" the points with a line (i.e. hyper-plane)
 Which line should we use?
– Choose an objective function
– For simple linear regression we choose sum squared error (SSE)
 S (predictedi – actuali)2 = S (residuei)2
– Thus, find the line which minimizes the sum of the squared
residues (e.g. least squares)
– This exactly mimics the case assuming data points were sampled
from the actual hyperplane with Gaussian noise added
CS 472 - Regression 4
How do we "learn" parameters
 For the 2-d problem (line) there are coefficients for the
bias and the independent variable (y-intercept and slope)
 To find the values for the coefficients which minimize the
objective function we take the partial derivates of the
objective function (SSE) with respect to the coefficients.
Set these to 0, and solve.
CS 472 - Regression 5
Y = b0 +b1X
b1 =
n xy
å - x y
å
å
n x2
- x
å
( )
2
å
b0 =
y
å - b1 x
å
n
Multiple Linear Regression
 There is a closed form for finding multiple linear regression
weights which requires matrix inversion, etc.
 There are also iterative techniques to find weights
 One is the delta rule. For regression we use an output node
which is not thresholded (just does a linear sum) and iteratively
apply the delta rule – For regression net is the output
 Where c is the learning rate and xi is the input for that weight
 Delta rule will update towards the objective of minimizing the
SSE, thus solving multiple linear regression
 There are other regression approaches that give different results
by trying to better handle outliers and other statistical anomalies
CS 472 - Regression 6
Dwi = c(t -net)xi
SSE and Linear Regression
 SSE chooses to square the difference of
the predicted vs actual.
 Don't want residues to cancel each other
 Could use absolute or other distances to
solve problem
S |predictedi – actuali| : L1 vs L2
 SSE leads to a parabolic error surface
which is good for gradient descent
 Which line would least squares choose?
– There is always one “best” fit with SSE
(L2)
CS 472 - Regression 7
SSE and Linear Regression
 SSE leads to a parabolic error
surface which is good for gradient
descent
 Which line would least squares
choose?
– There is always one “best” fit
CS 472 - Regression 8
7
SSE and Linear Regression
 SSE leads to a parabolic error
surface which is good for gradient
descent
 Which line would least squares
choose?
– There is always one “best” fit
CS 472 - Regression 9
7
5
SSE and Linear Regression
 SSE leads to a parabolic error
surface which is good for gradient
descent
 Which line would least squares
choose?
– There is always one “best” fit
 Note that the squared error causes
the model to be more highly
influenced by outliers
– Though best fit assuming Gaussian noise
error from true surface
CS 472 - Regression 10
3
3
7
5
SSE and Linear Regression Generalization
CS 472 - Regression 11
3
3
7
5
 In generalization all x
values map to a y
value on the chosen
regression line
x – Input Value
y – Input Value
0 1 2 3
1
Linear Regression - Challenge Question
 Assume we start with all weights as 1 (don’t use bias weight though
you usually always will – else forces the line through the origin)
 Remember for regression we use an output node which is not
thresholded (just does a linear sum) and iteratively apply the delta rule
– thus the net is the output
 What are the new weights after one iteration through the following
training set using the delta rule with a learning rate c = 1
 How does it generalize for the novel input (-.3, 0)?
CS 472 - Regression 12
x1 x2 Target y
.5 -.2 1
1 0 -.4
Dwi = c(t -net)xi
 After one epoch the weight vector is:
A. 1 .5
B. 1.35 .94
C. 1.35 .86
D. .4 .86
E. None of the above
Linear Regression - Challenge Question
 Assume we start with all weights as 1
 What are the new weights after one iteration through the
training set using the delta rule with a learning rate c = 1
 How does it generalize for the novel input (-.3, 0)?
CS 472 - Regression 13
Dwi = c(t -net)xi
x1 x2 Target Net w1 w2
1 1
.5 -.2 1
1 0 -.4
w1 = 1 +
Linear Regression - Challenge Question
 Assume we start with all weights as 1
 What are the new weights after one iteration through the
training set using the delta rule with a learning rate c = 1
 How does it generalize for the novel input (-.3, 0)?
– -.3*-.4 + 0*.86 = .12
CS 472 - Regression 14
Dwi = c(t -net)xi
x1 x2 Target Net w1 w2
1 1
.5 -.2 1 .3 1.35 .86
1 0 -.4 1.35 -.4 .86
w1 = 1 + 1(1 – .3).5 = 1.35
Linear Regression Homework
 Assume we start with all weights as 0 (Include the bias!)
 What are the new weights after one iteration through the
following training set using the delta rule with a learning
rate c = .2
 How does it generalize for the novel input (1, .5)?
CS 472 - Regression 15
x1 x2 Target
.3 .8 .7
-.3 1.6 -.1
.9 0 1.3
Dwi = c(t -net)xi
Intelligibility (Interpretable ML, Transparent)
 One advantage of linear regression models (and linear
classification) is the potential to look at the coefficients to give
insight into which input variables are most important in
predicting the output
 The variables with the largest magnitude have the highest
correlation with the output
– A large positive coefficient implies that the output will increase when
this input is increased (positively correlated)
– A large negative coefficient implies that the output will decrease when
this input is increased (negatively correlated)
– A small or 0 coefficient suggests that the input is uncorrelated with the
output (at least at the 1st order)
 Linear regression/classification can be used to find best
"indicators"
– Be careful not to confuse correlation with causality
– Linear cannot detect higher order correlations!! The power of more
complex machine learning models.
CS 472 - Regression 16
Anscombe's Quartet
What lines "really" best fit each case? – different approaches
CS 472 - Regression 17
Delta rule natural for regression, not classification
 First consider the one-dimensional case
 The decision surface for the perceptron would be any (first) point that divides
instances
 Delta rule will try to fit a line through the target values which minimizes SSE
and the decision point is where the line crosses .5 for 0/1 targets. Looking
down on data for perceptron view. Now flip it on its side for delta rule view.
 Will converge to the one optimal line (and dividing point) for this objective
CS 472 - Regression 18
x
x
z
0
1
Dwi = c(t -net)xi
Delta Rule for Classification?
 What would happen in this adjusted case for perceptron and delta rule and
where would the decision point (i.e. .5 crossing) be?
CS 472 - Regression 19
x
z
0
1
x
z
0
1
Delta Rule for Classification?
CS 472 - Regression 20
x
z
0
1
x
z
0
1
 Leads to misclassifications even though the data is linearly separable
 For Delta rule the objective function is to minimize the regression line SSE,
not maximize classification
Delta Rule for Classification?
 What would happen if we were doing a regression fit with a sigmoid/logistic
curve rather than a line?
CS 472 - Regression 21
x
z
0
1
x
z
0
1
x
z
0
1
Delta Rule for Classification?
 Sigmoid fits many decision cases quite well! This is basically what logistic
regression does.
CS 472 - Regression 22
x
z
0
1
x
z
0
1
x
z
0
1
23
1
0
Observation: Consider the 2 input perceptron case without a bias weight. Note that
the output z is a function of 2 input variables for the 2 input case (x1, x2), and thus we
really have a 3-d decision surface (i.e. a plane accounting for the two input variables
and the 3rd dimension for the output), yet the decision boundary is still a line in the 2-
d input space when we represent the outputs with different colors, symbols, etc. The
Delta rule would fit a regression plane to these points with the decision line being that
line where the plane went through .5. What would logistic regression do?
q
q
<
=
³
å
å
=
=
i
n
i
i
i
n
i
i
w
x
z
w
x
1
1
if
0
if
1
CS 472 - Regression
CS 472 - Regression 24
Logistic Regression
 One commonly used algorithm is Logistic Regression
 Assumes that the dependent (output) variable is binary
which is often the case in medical and other studies. (Does
person have disease or not, survive or not, accepted or not,
etc.)
 Like Quadric, Logistic Regression does a particular non-
linear transform on the data after which it just does linear
regression on the transformed data
 Logistic regression fits the data with a sigmoidal/logistic
curve rather than a line and outputs an approximation of
the probability of the output given the input
CS 472 - Regression 25
Logistic Regression Example
 Age (X axis, input variable) – Data is fictional
 Heart Failure (Y axis, 1 or 0, output variable)
 If use value of regression line as a probability approximation
– Extrapolates outside 0-1 and not as good empirically
 Sigmoidal curve to the right gives empirically good probability
approximation and is bounded between 0 and 1
CS 472 - Regression 26
Logistic Regression Approach
Learning
1. Transform initial input probabilities into log odds (logit)
2. Do a standard linear regression using the logit values
– This effectively fits a logistic curve to the data, while still just
doing a linear regression with the transformed input (ala quadric
machine, etc.)
Generalization
1. Find the value for the new input on the logit line
2. Transform that logit value back into a probability
CS 472 - Regression 27
Non-Linear Pre-Process to Logit (Log Odds)
CS 472 - Regression 28
Medication
Dosage
#
Cured
Total
Patients
Probability:
# Cured/Total
Patients
20 1 5 .20
30 2 6 .33
40 4 6 .67
50 6 7 .86
0 10 20 30 40 50 60
Cured
Not Cured
0 10 20 30 40 50 60
prob.
Cured
0
1
Non-Linear Pre-Process to Logit (Log Odds)
CS 472 - Regression 29
Medication
Dosage
#
Cured
Total
Patients
Probability:
# Cured/Total
Patients
20 1 5 .20
30 2 6 .33
40 4 6 .67
50 6 7 .86
0 10 20 30 40 50 60
Cured
Not Cured
0 10 20 30 40 50 60
prob.
Cured
0
1
Logistic Regression Approach
 Could use linear regression with the probability points, but
that would not extrapolate well
 Logistic version is better but how do we get it?
 Similar to Quadric we do a non-linear pre-process of the
input and then do linear regression on the transformed
values – do a linear regression on the log odds - Logit
CS 472 - Regression 30
0 10 20 30 40 50 60
prob.
Cured
0
1
0 10 20 30 40 50 60
prob.
Cured
0
1
Non-Linear Pre-Process to Logit (Log Odds)
CS 472 - Regression 31
Medication
Dosage
#
Cured
Total
Patients
Probability:
# Cured/Total
Patients
Odds:
p/(1-p) =
# cured/
# not
cured
Logit
Log Odds:
ln(Odds)
20 1 5 .20 .25 -1.39
30 2 6 .33 .50 -0.69
40 4 6 .67 2.0 0.69
50 6 7 .86 6.0 1.79
0 10 20 30 40 50 60
Cured
Not Cured
0 10 20 30 40 50 60
prob.
Cured
0
1
Regression of Log Odds
CS 472 - Regression 32
Medication
Dosage
#
Cured
Total
Patients
Probability:
#
Cured/Total
Patients
Odds:
p/(1-p) =
# cured/
# not cured
Log
Odds:
ln(Odds)
20 1 5 .20 .25 -1.39
30 2 6 .33 .50 -0.69
40 4 6 .67 2.0 0.69
50 6 7 .86 6.0 1.79
0 10 20 30 40 50 60
+2
-2
0
• y = .11x – 3.8 - Logit regression equation
• Now we have a regression line for log odds (logit)
• To generalize, we use the log odds value for the new data point
• Then we transform that log odds point to a probability: p = elogit(x)/(1+elogit(x))
• For example assume we want p for dosage = 10
Logit(10) = .11(10) – 3.8 = -2.7
p(10) = e-2.7/(1+e-2.7) = .06 [note that we just work backwards from logit to p]
• These p values make up the sigmoidal regression curve (which we never have to
actually plot)
prob.
Cured
0
1
Logistic Regression Homework
No longer a required homework
 You don’t actually have to come up with the weights for this one,
though you could do so quickly by using the closed form linear
regression approach
 Sketch each step you would need to learn the weights for the following
data set using logistic regression
 Sketch how you would generalize the probability of a heart attack
given a new input heart rate of 60
CS 472 - Regression 33
Heart Rate Heart Attack
50 Y
50 N
50 N
50 N
70 N
70 Y
90 Y
90 Y
90 N
90 Y
90 Y
Summary
 Linear Regression and Logistic Regression are nice tools
for many simple situations
– But both force us to fit the data with one shape (line or sigmoid)
which will often underfit
 Intelligible results
 When problem includes more arbitrary non-linearity then
we need more powerful models which we will introduce
– Though non-linear data transformation can help in these cases
while still using a linear model for learning
 These models are commonly used in data mining
applications and also as a "first attempt" at understanding
data trends, indicators, etc.
CS 472 - Regression 34
Non-Linear Regression
 Note that linear regression is to regression what the
perceptron is to classification
– Simple, useful models which will often underfit
 All of the more powerful classification models which we
will be discussing later in class can also be used for non-
linear regression, though we will mostly discuss them
using classification
– MLP with Backpropagation, Decision Trees, Nearest Neighbor,
etc.
 They can learn functions with arbitrarily complex high
dimensional shapes
CS 472 - Regression 35

More Related Content

Similar to Regression.pptx

Simplex method - Maximisation Case
Simplex method - Maximisation CaseSimplex method - Maximisation Case
Simplex method - Maximisation CaseJoseph Konnully
 
about power system operation and control13197214.ppt
about power system operation and control13197214.pptabout power system operation and control13197214.ppt
about power system operation and control13197214.pptMohammedAhmed66819
 
Bresenham circlesandpolygons
Bresenham circlesandpolygonsBresenham circlesandpolygons
Bresenham circlesandpolygonsaa11bb11
 
Bresenham circles and polygons derication
Bresenham circles and polygons dericationBresenham circles and polygons derication
Bresenham circles and polygons dericationKumar
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function홍배 김
 
Simplex Algorithm
Simplex AlgorithmSimplex Algorithm
Simplex AlgorithmAizaz Ahmad
 
Regression vs Neural Net
Regression vs Neural NetRegression vs Neural Net
Regression vs Neural NetRatul Alahy
 
Digital control systems (dcs) lecture 18-19-20
Digital control systems (dcs) lecture 18-19-20Digital control systems (dcs) lecture 18-19-20
Digital control systems (dcs) lecture 18-19-20Ali Rind
 
Linear regression by Kodebay
Linear regression by KodebayLinear regression by Kodebay
Linear regression by KodebayKodebay
 
GlobalLogic Machine Learning Webinar “Advanced Statistical Methods for Linear...
GlobalLogic Machine Learning Webinar “Advanced Statistical Methods for Linear...GlobalLogic Machine Learning Webinar “Advanced Statistical Methods for Linear...
GlobalLogic Machine Learning Webinar “Advanced Statistical Methods for Linear...GlobalLogic Ukraine
 
Simple linear regression
Simple linear regressionSimple linear regression
Simple linear regressionMaria Theresa
 
MLHEP Lectures - day 2, basic track
MLHEP Lectures - day 2, basic trackMLHEP Lectures - day 2, basic track
MLHEP Lectures - day 2, basic trackarogozhnikov
 
Cs6402 design and analysis of algorithms may june 2016 answer key
Cs6402 design and analysis of algorithms may june 2016 answer keyCs6402 design and analysis of algorithms may june 2016 answer key
Cs6402 design and analysis of algorithms may june 2016 answer keyappasami
 

Similar to Regression.pptx (20)

simplex method
simplex methodsimplex method
simplex method
 
Simplex method - Maximisation Case
Simplex method - Maximisation CaseSimplex method - Maximisation Case
Simplex method - Maximisation Case
 
about power system operation and control13197214.ppt
about power system operation and control13197214.pptabout power system operation and control13197214.ppt
about power system operation and control13197214.ppt
 
Bresenham circlesandpolygons
Bresenham circlesandpolygonsBresenham circlesandpolygons
Bresenham circlesandpolygons
 
Bresenham circles and polygons derication
Bresenham circles and polygons dericationBresenham circles and polygons derication
Bresenham circles and polygons derication
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function
 
Bisection method
Bisection methodBisection method
Bisection method
 
Simplex Algorithm
Simplex AlgorithmSimplex Algorithm
Simplex Algorithm
 
Regression vs Neural Net
Regression vs Neural NetRegression vs Neural Net
Regression vs Neural Net
 
Digital control systems (dcs) lecture 18-19-20
Digital control systems (dcs) lecture 18-19-20Digital control systems (dcs) lecture 18-19-20
Digital control systems (dcs) lecture 18-19-20
 
Linear regression by Kodebay
Linear regression by KodebayLinear regression by Kodebay
Linear regression by Kodebay
 
Optimization tutorial
Optimization tutorialOptimization tutorial
Optimization tutorial
 
GlobalLogic Machine Learning Webinar “Advanced Statistical Methods for Linear...
GlobalLogic Machine Learning Webinar “Advanced Statistical Methods for Linear...GlobalLogic Machine Learning Webinar “Advanced Statistical Methods for Linear...
GlobalLogic Machine Learning Webinar “Advanced Statistical Methods for Linear...
 
factoring
factoringfactoring
factoring
 
Simple linear regression
Simple linear regressionSimple linear regression
Simple linear regression
 
Lect14 lines+circles
Lect14 lines+circlesLect14 lines+circles
Lect14 lines+circles
 
Signals and Systems Homework Help.pptx
Signals and Systems Homework Help.pptxSignals and Systems Homework Help.pptx
Signals and Systems Homework Help.pptx
 
MLHEP Lectures - day 2, basic track
MLHEP Lectures - day 2, basic trackMLHEP Lectures - day 2, basic track
MLHEP Lectures - day 2, basic track
 
Cs6402 design and analysis of algorithms may june 2016 answer key
Cs6402 design and analysis of algorithms may june 2016 answer keyCs6402 design and analysis of algorithms may june 2016 answer key
Cs6402 design and analysis of algorithms may june 2016 answer key
 
Input analysis
Input analysisInput analysis
Input analysis
 

Recently uploaded

High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLDeelipZope
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineeringmalavadedarshan25
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learningmisbanausheenparvam
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxDeepakSakkari2
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfAsst.prof M.Gokilavani
 
microprocessor 8085 and its interfacing
microprocessor 8085  and its interfacingmicroprocessor 8085  and its interfacing
microprocessor 8085 and its interfacingjaychoudhary37
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxbritheesh05
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxPoojaBan
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfAsst.prof M.Gokilavani
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.eptoze12
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girlsssuser7cb4ff
 

Recently uploaded (20)

High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCL
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineering
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learning
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptx
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
microprocessor 8085 and its interfacing
microprocessor 8085  and its interfacingmicroprocessor 8085  and its interfacing
microprocessor 8085 and its interfacing
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptx
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptx
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girls
 

Regression.pptx

  • 1. CS 472 - Regression 1 Regression  For classification the output(s) is nominal  In regression the output is continuous – Function Approximation  Many models could be used – Simplest is linear regression – Fit data with the best hyper-plane which "goes through" the points y dependent variable (output) x – independent variable (input)
  • 2. CS 472 - Regression 2 Regression  For classification the output(s) is nominal  In regression the output is continuous – Function Approximation  Many models could be used – Simplest is linear regression – Fit data with the best hyper-plane which "goes through" the points y dependent variable (output) x – independent variable (input)
  • 3. CS 472 - Regression 3 Regression  For classification the output(s) is nominal  In regression the output is continuous – Function Approximation  Many models could be used – Simplest is linear regression – Fit data with the best hyper-plane which "goes through" the points – For each point the difference between the predicted point and the actual observation is the residue y x
  • 4. Simple Linear Regression  For now, assume just one (input) independent variable x, and one (output) dependent variable y – Multiple linear regression assumes an input vector x – Multivariate linear regression assumes an output vector y  We "fit" the points with a line (i.e. hyper-plane)  Which line should we use? – Choose an objective function – For simple linear regression we choose sum squared error (SSE)  S (predictedi – actuali)2 = S (residuei)2 – Thus, find the line which minimizes the sum of the squared residues (e.g. least squares) – This exactly mimics the case assuming data points were sampled from the actual hyperplane with Gaussian noise added CS 472 - Regression 4
  • 5. How do we "learn" parameters  For the 2-d problem (line) there are coefficients for the bias and the independent variable (y-intercept and slope)  To find the values for the coefficients which minimize the objective function we take the partial derivates of the objective function (SSE) with respect to the coefficients. Set these to 0, and solve. CS 472 - Regression 5 Y = b0 +b1X b1 = n xy å - x y å å n x2 - x å ( ) 2 å b0 = y å - b1 x å n
  • 6. Multiple Linear Regression  There is a closed form for finding multiple linear regression weights which requires matrix inversion, etc.  There are also iterative techniques to find weights  One is the delta rule. For regression we use an output node which is not thresholded (just does a linear sum) and iteratively apply the delta rule – For regression net is the output  Where c is the learning rate and xi is the input for that weight  Delta rule will update towards the objective of minimizing the SSE, thus solving multiple linear regression  There are other regression approaches that give different results by trying to better handle outliers and other statistical anomalies CS 472 - Regression 6 Dwi = c(t -net)xi
  • 7. SSE and Linear Regression  SSE chooses to square the difference of the predicted vs actual.  Don't want residues to cancel each other  Could use absolute or other distances to solve problem S |predictedi – actuali| : L1 vs L2  SSE leads to a parabolic error surface which is good for gradient descent  Which line would least squares choose? – There is always one “best” fit with SSE (L2) CS 472 - Regression 7
  • 8. SSE and Linear Regression  SSE leads to a parabolic error surface which is good for gradient descent  Which line would least squares choose? – There is always one “best” fit CS 472 - Regression 8 7
  • 9. SSE and Linear Regression  SSE leads to a parabolic error surface which is good for gradient descent  Which line would least squares choose? – There is always one “best” fit CS 472 - Regression 9 7 5
  • 10. SSE and Linear Regression  SSE leads to a parabolic error surface which is good for gradient descent  Which line would least squares choose? – There is always one “best” fit  Note that the squared error causes the model to be more highly influenced by outliers – Though best fit assuming Gaussian noise error from true surface CS 472 - Regression 10 3 3 7 5
  • 11. SSE and Linear Regression Generalization CS 472 - Regression 11 3 3 7 5  In generalization all x values map to a y value on the chosen regression line x – Input Value y – Input Value 0 1 2 3 1
  • 12. Linear Regression - Challenge Question  Assume we start with all weights as 1 (don’t use bias weight though you usually always will – else forces the line through the origin)  Remember for regression we use an output node which is not thresholded (just does a linear sum) and iteratively apply the delta rule – thus the net is the output  What are the new weights after one iteration through the following training set using the delta rule with a learning rate c = 1  How does it generalize for the novel input (-.3, 0)? CS 472 - Regression 12 x1 x2 Target y .5 -.2 1 1 0 -.4 Dwi = c(t -net)xi  After one epoch the weight vector is: A. 1 .5 B. 1.35 .94 C. 1.35 .86 D. .4 .86 E. None of the above
  • 13. Linear Regression - Challenge Question  Assume we start with all weights as 1  What are the new weights after one iteration through the training set using the delta rule with a learning rate c = 1  How does it generalize for the novel input (-.3, 0)? CS 472 - Regression 13 Dwi = c(t -net)xi x1 x2 Target Net w1 w2 1 1 .5 -.2 1 1 0 -.4 w1 = 1 +
  • 14. Linear Regression - Challenge Question  Assume we start with all weights as 1  What are the new weights after one iteration through the training set using the delta rule with a learning rate c = 1  How does it generalize for the novel input (-.3, 0)? – -.3*-.4 + 0*.86 = .12 CS 472 - Regression 14 Dwi = c(t -net)xi x1 x2 Target Net w1 w2 1 1 .5 -.2 1 .3 1.35 .86 1 0 -.4 1.35 -.4 .86 w1 = 1 + 1(1 – .3).5 = 1.35
  • 15. Linear Regression Homework  Assume we start with all weights as 0 (Include the bias!)  What are the new weights after one iteration through the following training set using the delta rule with a learning rate c = .2  How does it generalize for the novel input (1, .5)? CS 472 - Regression 15 x1 x2 Target .3 .8 .7 -.3 1.6 -.1 .9 0 1.3 Dwi = c(t -net)xi
  • 16. Intelligibility (Interpretable ML, Transparent)  One advantage of linear regression models (and linear classification) is the potential to look at the coefficients to give insight into which input variables are most important in predicting the output  The variables with the largest magnitude have the highest correlation with the output – A large positive coefficient implies that the output will increase when this input is increased (positively correlated) – A large negative coefficient implies that the output will decrease when this input is increased (negatively correlated) – A small or 0 coefficient suggests that the input is uncorrelated with the output (at least at the 1st order)  Linear regression/classification can be used to find best "indicators" – Be careful not to confuse correlation with causality – Linear cannot detect higher order correlations!! The power of more complex machine learning models. CS 472 - Regression 16
  • 17. Anscombe's Quartet What lines "really" best fit each case? – different approaches CS 472 - Regression 17
  • 18. Delta rule natural for regression, not classification  First consider the one-dimensional case  The decision surface for the perceptron would be any (first) point that divides instances  Delta rule will try to fit a line through the target values which minimizes SSE and the decision point is where the line crosses .5 for 0/1 targets. Looking down on data for perceptron view. Now flip it on its side for delta rule view.  Will converge to the one optimal line (and dividing point) for this objective CS 472 - Regression 18 x x z 0 1 Dwi = c(t -net)xi
  • 19. Delta Rule for Classification?  What would happen in this adjusted case for perceptron and delta rule and where would the decision point (i.e. .5 crossing) be? CS 472 - Regression 19 x z 0 1 x z 0 1
  • 20. Delta Rule for Classification? CS 472 - Regression 20 x z 0 1 x z 0 1  Leads to misclassifications even though the data is linearly separable  For Delta rule the objective function is to minimize the regression line SSE, not maximize classification
  • 21. Delta Rule for Classification?  What would happen if we were doing a regression fit with a sigmoid/logistic curve rather than a line? CS 472 - Regression 21 x z 0 1 x z 0 1 x z 0 1
  • 22. Delta Rule for Classification?  Sigmoid fits many decision cases quite well! This is basically what logistic regression does. CS 472 - Regression 22 x z 0 1 x z 0 1 x z 0 1
  • 23. 23 1 0 Observation: Consider the 2 input perceptron case without a bias weight. Note that the output z is a function of 2 input variables for the 2 input case (x1, x2), and thus we really have a 3-d decision surface (i.e. a plane accounting for the two input variables and the 3rd dimension for the output), yet the decision boundary is still a line in the 2- d input space when we represent the outputs with different colors, symbols, etc. The Delta rule would fit a regression plane to these points with the decision line being that line where the plane went through .5. What would logistic regression do? q q < = ³ å å = = i n i i i n i i w x z w x 1 1 if 0 if 1 CS 472 - Regression
  • 24. CS 472 - Regression 24
  • 25. Logistic Regression  One commonly used algorithm is Logistic Regression  Assumes that the dependent (output) variable is binary which is often the case in medical and other studies. (Does person have disease or not, survive or not, accepted or not, etc.)  Like Quadric, Logistic Regression does a particular non- linear transform on the data after which it just does linear regression on the transformed data  Logistic regression fits the data with a sigmoidal/logistic curve rather than a line and outputs an approximation of the probability of the output given the input CS 472 - Regression 25
  • 26. Logistic Regression Example  Age (X axis, input variable) – Data is fictional  Heart Failure (Y axis, 1 or 0, output variable)  If use value of regression line as a probability approximation – Extrapolates outside 0-1 and not as good empirically  Sigmoidal curve to the right gives empirically good probability approximation and is bounded between 0 and 1 CS 472 - Regression 26
  • 27. Logistic Regression Approach Learning 1. Transform initial input probabilities into log odds (logit) 2. Do a standard linear regression using the logit values – This effectively fits a logistic curve to the data, while still just doing a linear regression with the transformed input (ala quadric machine, etc.) Generalization 1. Find the value for the new input on the logit line 2. Transform that logit value back into a probability CS 472 - Regression 27
  • 28. Non-Linear Pre-Process to Logit (Log Odds) CS 472 - Regression 28 Medication Dosage # Cured Total Patients Probability: # Cured/Total Patients 20 1 5 .20 30 2 6 .33 40 4 6 .67 50 6 7 .86 0 10 20 30 40 50 60 Cured Not Cured 0 10 20 30 40 50 60 prob. Cured 0 1
  • 29. Non-Linear Pre-Process to Logit (Log Odds) CS 472 - Regression 29 Medication Dosage # Cured Total Patients Probability: # Cured/Total Patients 20 1 5 .20 30 2 6 .33 40 4 6 .67 50 6 7 .86 0 10 20 30 40 50 60 Cured Not Cured 0 10 20 30 40 50 60 prob. Cured 0 1
  • 30. Logistic Regression Approach  Could use linear regression with the probability points, but that would not extrapolate well  Logistic version is better but how do we get it?  Similar to Quadric we do a non-linear pre-process of the input and then do linear regression on the transformed values – do a linear regression on the log odds - Logit CS 472 - Regression 30 0 10 20 30 40 50 60 prob. Cured 0 1 0 10 20 30 40 50 60 prob. Cured 0 1
  • 31. Non-Linear Pre-Process to Logit (Log Odds) CS 472 - Regression 31 Medication Dosage # Cured Total Patients Probability: # Cured/Total Patients Odds: p/(1-p) = # cured/ # not cured Logit Log Odds: ln(Odds) 20 1 5 .20 .25 -1.39 30 2 6 .33 .50 -0.69 40 4 6 .67 2.0 0.69 50 6 7 .86 6.0 1.79 0 10 20 30 40 50 60 Cured Not Cured 0 10 20 30 40 50 60 prob. Cured 0 1
  • 32. Regression of Log Odds CS 472 - Regression 32 Medication Dosage # Cured Total Patients Probability: # Cured/Total Patients Odds: p/(1-p) = # cured/ # not cured Log Odds: ln(Odds) 20 1 5 .20 .25 -1.39 30 2 6 .33 .50 -0.69 40 4 6 .67 2.0 0.69 50 6 7 .86 6.0 1.79 0 10 20 30 40 50 60 +2 -2 0 • y = .11x – 3.8 - Logit regression equation • Now we have a regression line for log odds (logit) • To generalize, we use the log odds value for the new data point • Then we transform that log odds point to a probability: p = elogit(x)/(1+elogit(x)) • For example assume we want p for dosage = 10 Logit(10) = .11(10) – 3.8 = -2.7 p(10) = e-2.7/(1+e-2.7) = .06 [note that we just work backwards from logit to p] • These p values make up the sigmoidal regression curve (which we never have to actually plot) prob. Cured 0 1
  • 33. Logistic Regression Homework No longer a required homework  You don’t actually have to come up with the weights for this one, though you could do so quickly by using the closed form linear regression approach  Sketch each step you would need to learn the weights for the following data set using logistic regression  Sketch how you would generalize the probability of a heart attack given a new input heart rate of 60 CS 472 - Regression 33 Heart Rate Heart Attack 50 Y 50 N 50 N 50 N 70 N 70 Y 90 Y 90 Y 90 N 90 Y 90 Y
  • 34. Summary  Linear Regression and Logistic Regression are nice tools for many simple situations – But both force us to fit the data with one shape (line or sigmoid) which will often underfit  Intelligible results  When problem includes more arbitrary non-linearity then we need more powerful models which we will introduce – Though non-linear data transformation can help in these cases while still using a linear model for learning  These models are commonly used in data mining applications and also as a "first attempt" at understanding data trends, indicators, etc. CS 472 - Regression 34
  • 35. Non-Linear Regression  Note that linear regression is to regression what the perceptron is to classification – Simple, useful models which will often underfit  All of the more powerful classification models which we will be discussing later in class can also be used for non- linear regression, though we will mostly discuss them using classification – MLP with Backpropagation, Decision Trees, Nearest Neighbor, etc.  They can learn functions with arbitrarily complex high dimensional shapes CS 472 - Regression 35