Comparison of Regression and Neural Networks Models

Comparison of Regression
and Neural Networks
By: Peshal Pokhrel, Qudrat Ratul, Brett Sneed, and Maxwell
Vestrand

Outline
1. What is regression?
2. Simple linear regression
3. Least Squares
4. Multiple linear regression
5. Logistic regression
6. Other regression models
7. Worked examples

What is Regression?
A process of estimating the relationship between one or more variables i.e.,
functional dependency between variables.
Types of Regression
1) Linear Regression
2) Non Linear Regression

Linear Regression
a) Simple Linear Regression
b) Multiple Linear Regression
c) Logistic Regression

When do we use Regression
Prediction of the target
Relationship between independent variable and dependent variable
Testing of hypothesis

General Example of Regression
Source: http://ci.columbia.edu/ci/premba_test/c0331/s7/s7_6.html

Simple Linear Regression
Relates a single independent variable x to a single dependent variable y
Statistical relationship of the form: y = β0 + β1 x + ε
β0, β1 : an unknown parameterization
ε : the error in y, a RV with normal distribution
Define fitness using the sum of squared errors

85
97
91
88
94
85
84
86
88
90
92
94
96
98
0 2 4 6 8
MARKS
ASSIGNMENT #
Assignment No Marks
1 85
2 97
3 91
4 88
5 94
6 85
7 ?
Ŷ = (560/6) = 90
-5
+7
+1
-2
+4
-5
0
+12
-12
Residual

• Goal of Simple linear regression: minimize the sum of square of
residuals/errors
• New best fit line remove much of Sum of Square of Error.
• A significant regression model should literally fit the data better.
25 49 1 4 16 25 = 120

Relation with algebra
Slope – intercept form a line
• y = mx + b
• y -> dependent variable
• x -> independent variable
• m -> slope (rise/run)
• b -> y-intercept of x axis
In simple linear regression model:
y = β0 + β1 x + є
є -> error term
0
1
2
3
4
5
6
7
8
9
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
y
x
E(y|x)

Least Square method
Criterion :
min ( yi - ŷi ) 2
yi
Observed value of dependent
variable
ŷi
Estimated value of the
dependent variable

Least Square method
No of Minute spent Marks
60 85
240 97
120 91
150 88
200 94
100 85
x̅ = 145 Ӯ = 90
85 85
91
88
94
97
84
86
88
90
92
94
96
98
0 50 100 150 200 250 300MARKS
NO OF MINUTE STUDIED
(145, 90) Centroid

Least Square Calculation
b1 =
(xi – x̅ )(yi − Ӯ)
(𝑥𝑖 − 𝑥̅ )2
slope
b0 = Ӯ - b1 x̅
intercept
85 85
91
88
94
97
84
86
88
90
92
94
96
98
0 50 100 150 200 250 300
MARKS
NO OF MINUTE STUDIED
x y xi – x̅ yi - Ӯ (xi – x̅ )(yi - Ӯ) (xi - x̅ )2
60 85 -85 -5 425 7225
100 85 -45 -5 225 2025
120 91 -25 1 -25 625
150 88 5 -2 -10 25
200 94 55 4 220 3025
240 97 95 7 665 9025
x̅ =
145
Ӯ =
90
SUM = 1500 SUM =
21950
(145, 90)
ŷ = b0 + b1x

Ordinary Least Squares assumptions
• Linearity
• Parameters are linearly independent
• Errors are exogenous – E(ε | x) = 0
• Spherical errors
• Errors have constant variance - homoscedastic
• Errors are not auto-correlated
• Errors follow a normal distribution

Extended versions of least squares
• Generalized least squares (GLS) – weights error terms using their
covariance
• Allows estimation of β when errors are not spherical
• Percentage least squares – use percent error rather than
absolute error
• Multiplicative error rather than additive error
• Iteratively reweighted least squares (IRLS)
• Iteratively use GLS using improving estimates of covariance
• Total least squares

Problem: Multiple independent variable
Miles
traveled
(x1)
# of
delivery
(x2)
Travel time
hr.
(y)
89 4 7
66 1 5.5
78 3 6.6
111 6 7.4
44 1 4.8
77 3 6.4
80 3 7
66 2 5.6
100 5 7.3
76 3 6.4
y
X1
X2

Multiple linear regression
• Relate a single dependent variable to multiple independent variables
• Vector form: 𝑦𝑖 = 𝑥𝑖
𝑇
𝛽 + ε𝑖
• 𝑦𝑖 : independent variable value for the iᵗʰ observation
• 𝑥𝑖 : 1 × p vector of dependent variable values for the iᵗʰ observation
• 𝛽 : 1 × p unknown parameterization for p independent variables
• ε𝑖 : error in the iᵗʰ observation
• Matrix form: 𝑦 = 𝑋 𝛽 + ε
• 𝑦 : n × 1 vector of independent variable values for n observations
• 𝑋 : n × p matrix of independent variable values
• 𝛽 : 1 × p unknown parameterization for p independent variables
• ε : n × 1 vector of errors; Random variables with normal distribution

Multiple linear regression:: considerations
y
X3 X4
X1 X2
Overfitting : too many relation between dependent
and independent variable
Multicollinearity : relation between independent variable (co-related)
[Some independent variable doesn’t have any contribution]
Equation: 𝑦 = β0+ β1 x1+ β2 x2+….+ βp xp+ є
Example: y = 27+9 x1+12 x2

Overfitting & Multicollinearity
Miles traveled
(x1)
# of delivery
(x2)
Gas price
(x3)
Travel time hr.
(y)
89 4 3.84 7
66 1 3.19 5.5
78 3 3.78 6.6
111 6 3.89 7.4
44 1 3.57 4.8
77 3 3.57 6.4
80 3 3.03 7
66 2 3.51 5.6
100 5 3.54 7.3
76 3 3.25 6.4

Overfitting & Multicollinearity
independent variables
Dependent and independent variables

Data for problem - 2
Credit score Is approved
655 0
692 0
681 0
663 1
688 1
693 1
699 0
699 1
683 1
698 0
655 1
703 0
704 1
745 1
702 1 *Only 15 value considered in the table
?

Logistic regression
No of Minute spent
655
692
681
663
688
693
699
699
683
698
655
703
704
745
702
MODEL
Approved
663
688
693
699
683
655
704
745
702
Not Approved
655
692
681
699
698
703

Estimating Regression Equation
• logit (𝑝) = ln (
𝑝
1−𝑝
) = β0 + β1 x1
• By solving: p̂ =
𝑒
β0 + β1 x1
_
1+𝑒β0 + β1 x1 _

Polynomial Regression
Power of independent variable is more than 1.
Ex: y = 2 + 9 x2
Consideration:
While there might be a temptation to fit a higher degree
polynomial to get lower error, this can result in over-fitting.
Just rightOverfittingUnder fitting

Stepwise Regression
• If there are multiple independent variable
• Useful when we have high dimension of dataset
• Selection of independent variable done without human decision
• Achieved by using statistical values like, R-square, t-stats and AIC metric to
discern significant variables

Ridge Regression
• Used when the data suffers from multicollinearity
• Reduces the standard errors by adding a degree of bias
• Robust version of linear regression
• Less subject to over-fitting, and easier to interpret
• Extension with auto variable reduction: Lasso regression

Other Models
Ecologic regression:
• if data is segmented into several rather large core strata, groups, or bins.
Logic regression:
• All variables are binary. typically in scoring algorithms.
• More robust form of logistic regression
Gradient Descent:
• Very large dataset either by number of row or number of column or both
Jackknife regression :
• New type of regression
• It solves all the drawbacks of traditional regression
• Ideal for black-box predictive algorithms

Worked Examples: Performance of Parallel Scaling
Amdahl’s Law:
where p is the percentage of the
task affected by the performance
change, and s is the speedup in
the changed portion of the task.
A speedup is a ratio in the time to
complete a task. Taking a task
from 4s to 2s is a speedup of 2.

Worked Examples: Regression Approach
Solve using Regression
Amdahl’s Law:
Regression Model:
Time: output (dependent) variable
Beta’s: regression variable
n: number of processors

Worked Examples: Regression Solution
Dataset:
Normal Equations:
Solution:
To evaluate, compute:

Worked Examples: Deep Learning Approach
From training data, Deep Learning
develops a model between the input
and output.
Input: a number of processors.
Output: a time.
Trained using gradient descent with
the backpropagation rule for
calculating changes in weights.

Worked Examples: Deep Learning Solution
Custom deep learning code in Octave.
Input and output layers of one neuron.
Hidden layers of 20, 15, 10 neurons.
Adjacent layers are fully connected.
Individual bias for each neuron.
For each layer, first half of neurons
have sigmoidal activation and second
half have linear activation.

Worked Examples: Regression vs Deep Learning
Winner: Regression

Conclusion
Regression is able to model data when a general relationship is expected to exist
between the independent and dependent data.
There are many forms of regression analysis. Some care should be taken to
choose a regression model that describes the data.
When a simple model for a dataset is known, regression should probably be tried
before a learning algorithm.

Referenced Works
Kumar, Sameer, et al. "Achieving strong scaling with NAMD on Blue Gene/L."
Parallel and Distributed Processing Symposium, 2006. IPDPS 2006. 20th
International. IEEE, 2006.

Comparison of Regression and Neural Networks Models

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Comparison of Regression and Neural Networks Models

Similar to Comparison of Regression and Neural Networks Models (20)

More from Ratul Alahy

More from Ratul Alahy (6)

Recently uploaded

Recently uploaded (20)

Comparison of Regression and Neural Networks Models

Editor's Notes