2. Outline
1. What is regression?
2. Simple linear regression
3. Least Squares
4. Multiple linear regression
5. Logistic regression
6. Other regression models
7. Worked examples
3. What is Regression?
A process of estimating the relationship between one or more variables i.e.,
functional dependency between variables.
Types of Regression
1) Linear Regression
2) Non Linear Regression
5. When do we use Regression
Prediction of the target
Relationship between independent variable and dependent variable
Testing of hypothesis
6. General Example of Regression
Source: http://ci.columbia.edu/ci/premba_test/c0331/s7/s7_6.html
7. Simple Linear Regression
Relates a single independent variable x to a single dependent variable y
Statistical relationship of the form: y = β0 + β1 x + ε
β0, β1 : an unknown parameterization
ε : the error in y, a RV with normal distribution
Define fitness using the sum of squared errors
9. Simple Linear Regression
• Goal of Simple linear regression: minimize the sum of square of
residuals/errors
• New best fit line remove much of Sum of Square of Error.
• A significant regression model should literally fit the data better.
25 49 1 4 16 25 = 120
10. Relation with algebra
Slope – intercept form a line
• y = mx + b
• y -> dependent variable
• x -> independent variable
• m -> slope (rise/run)
• b -> y-intercept of x axis
In simple linear regression model:
y = β0 + β1 x + є
є -> error term
0
1
2
3
4
5
6
7
8
9
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
y
x
E(y|x)
11. Least Square method
Criterion :
min ( yi - ŷi ) 2
yi
Observed value of dependent
variable
ŷi
Estimated value of the
dependent variable
12. Least Square method
No of Minute spent Marks
60 85
240 97
120 91
150 88
200 94
100 85
x̅ = 145 Ӯ = 90
85 85
91
88
94
97
84
86
88
90
92
94
96
98
0 50 100 150 200 250 300MARKS
NO OF MINUTE STUDIED
(145, 90) Centroid
14. Ordinary Least Squares assumptions
• Linearity
• Parameters are linearly independent
• Errors are exogenous – E(ε | x) = 0
• Spherical errors
• Errors have constant variance - homoscedastic
• Errors are not auto-correlated
• Errors follow a normal distribution
15. Extended versions of least squares
• Generalized least squares (GLS) – weights error terms using their
covariance
• Allows estimation of β when errors are not spherical
• Percentage least squares – use percent error rather than
absolute error
• Multiplicative error rather than additive error
• Iteratively reweighted least squares (IRLS)
• Iteratively use GLS using improving estimates of covariance
• Total least squares
17. Multiple linear regression
• Relate a single dependent variable to multiple independent variables
• Vector form: 𝑦𝑖 = 𝑥𝑖
𝑇
𝛽 + ε𝑖
• 𝑦𝑖 : independent variable value for the iᵗʰ observation
• 𝑥𝑖 : 1 × p vector of dependent variable values for the iᵗʰ observation
• 𝛽 : 1 × p unknown parameterization for p independent variables
• ε𝑖 : error in the iᵗʰ observation
• Matrix form: 𝑦 = 𝑋 𝛽 + ε
• 𝑦 : n × 1 vector of independent variable values for n observations
• 𝑋 : n × p matrix of independent variable values
• 𝛽 : 1 × p unknown parameterization for p independent variables
• ε : n × 1 vector of errors; Random variables with normal distribution
18. Multiple linear regression:: considerations
y
X3 X4
X1 X2
Overfitting : too many relation between dependent
and independent variable
Multicollinearity : relation between independent variable (co-related)
[Some independent variable doesn’t have any contribution]
Equation: 𝑦 = β0+ β1 x1+ β2 x2+….+ βp xp+ є
Example: y = 27+9 x1+12 x2
24. Polynomial Regression
Power of independent variable is more than 1.
Ex: y = 2 + 9 x2
Consideration:
While there might be a temptation to fit a higher degree
polynomial to get lower error, this can result in over-fitting.
Just rightOverfittingUnder fitting
25. Stepwise Regression
• If there are multiple independent variable
• Useful when we have high dimension of dataset
• Selection of independent variable done without human decision
• Achieved by using statistical values like, R-square, t-stats and AIC metric to
discern significant variables
26. Ridge Regression
• Used when the data suffers from multicollinearity
• Reduces the standard errors by adding a degree of bias
• Robust version of linear regression
• Less subject to over-fitting, and easier to interpret
• Extension with auto variable reduction: Lasso regression
27. Other Models
Ecologic regression:
• if data is segmented into several rather large core strata, groups, or bins.
Logic regression:
• All variables are binary. typically in scoring algorithms.
• More robust form of logistic regression
Gradient Descent:
• Very large dataset either by number of row or number of column or both
Jackknife regression :
• New type of regression
• It solves all the drawbacks of traditional regression
• Ideal for black-box predictive algorithms
28. Worked Examples: Performance of Parallel Scaling
Amdahl’s Law:
where p is the percentage of the
task affected by the performance
change, and s is the speedup in
the changed portion of the task.
A speedup is a ratio in the time to
complete a task. Taking a task
from 4s to 2s is a speedup of 2.
29. Worked Examples: Regression Approach
Solve using Regression
Amdahl’s Law:
Regression Model:
Time: output (dependent) variable
Beta’s: regression variable
n: number of processors
31. Worked Examples: Deep Learning Approach
From training data, Deep Learning
develops a model between the input
and output.
Input: a number of processors.
Output: a time.
Trained using gradient descent with
the backpropagation rule for
calculating changes in weights.
32. Worked Examples: Deep Learning Solution
Custom deep learning code in Octave.
Input and output layers of one neuron.
Hidden layers of 20, 15, 10 neurons.
Adjacent layers are fully connected.
Individual bias for each neuron.
For each layer, first half of neurons
have sigmoidal activation and second
half have linear activation.
34. Conclusion
Regression is able to model data when a general relationship is expected to exist
between the independent and dependent data.
There are many forms of regression analysis. Some care should be taken to
choose a regression model that describes the data.
When a simple model for a dataset is known, regression should probably be tried
before a learning algorithm.
35. Referenced Works
Kumar, Sameer, et al. "Achieving strong scaling with NAMD on Blue Gene/L."
Parallel and Distributed Processing Symposium, 2006. IPDPS 2006. 20th
International. IEEE, 2006.
Editor's Notes
Find the centroid. Centroid is the point where mean of dependent variable and independent variable meets. Its important because best fit line should go through the centroid.