SlideShare a Scribd company logo
1 of 49
Download to read offline
From Linear Modeling to
Machine Learning
Statistics for Data Science
CSE357 - Fall 2021
● Ridge Regression (L2 Penalized)
● Lasso Regression (L1 Penalized)
● Feature Selection
● Accuracy Metrics
Linear Models
Finding a linear function based on X to best yield Y.
X = “covariate” = “feature” = “predictor” = “regressor” = “independent variable”
Y = “response variable” = “outcome” = “dependent variable”
Linear Regression
Finding a linear function based on X to best yield Y.
X = “covariate” = “feature” = “predictor” = “regressor” = “independent variable”
Y = “response variable” = “outcome” = “dependent variable”
Regression:
goal: estimate the function r
Linear Regression
Finding a linear function based on X to best yield Y.
X = “covariate” = “feature” = “predictor” = “regressor” = “independent variable”
Y = “response variable” = “outcome” = “dependent variable”
Regression:
goal: estimate the function r
The expected value of Y, given
that the random variable X is
equal to some specific value, x.
Linear Regression
Finding a linear function based on X to best yield Y.
X = “covariate” = “feature” = “predictor” = “regressor” = “independent variable”
Y = “response variable” = “outcome” = “dependent variable”
Regression:
goal: estimate the function r
Linear Regression (univariate version):
goal: find 𝛽0
, 𝛽1
such that
Linear Regression
Review: Multiple Linear Regression
Residual:
Direct Estimates (Normal Equation)
y
What if Yi
∊ {0, 1}? (i.e. we want “classification”)
P(Yi
= 0 | X = x)
Thus, 0 is class 0
and 1 is class 1.
Review: Logistic Regression
Often we want to make a classification based on multiple features:
Logistic Regression with Multiple Feats
Y-axis is Y (i.e. 1 or 0)
Often we want to make a classification based on multiple features:
Y-axis is Y (i.e. 1 or 0)
To make room for
multiple Xs, let’s get rid
of y-axis. Instead, show
decision point.
Logistic Regression with Multiple Feats
Often we want to make a classification based on multiple features:
Logistic Regression with Multiple Feats
To make room for
multiple Xs, let’s get rid
of y-axis. Instead, show
decision point.
decision line
decision hyperplane
What if Yi
∊ {0, 1}? (i.e. we want “classification”)
Review: Logistic Regression
We’re learning a linear (i.e. flat)
separating hyperplane, but fitting
it to a logit outcome.
(https://www.linkedin.com/pulse/predicting-outcomes-pr
obabilities-logistic-regression-konstantinidis/)
What if Yi
∊ {0, 1}? (i.e. we want “classification”)
To estimate ,
one can use
reweighted least
squares:
(Wasserman, 2005; Li, 2010)
Logistic Regression: Faster Approach
X1
X2
X3
Y
(genes) (health)
Supervised Machine Learning
X1
X2
X3
Y
Supervised Machine Learning
X1
X2
X3
Y
X1
X2
X3
X4
X5
X6
X7
X8
X9
X10
X11
X12
Y
X13
X14
X15
... Xm
Supervised Machine Learning
X1
X2
X3
Y
X1
X2
X3
X4
X5
X6
X7
X8
X9
X10
X11
X12
Y
X13
X14
X15
... Xm
Task: Determine a function, f (or parameters to a function) such that f(X) = Y
Supervised Machine Learning
X1
X2
X3
Y
X1
X2
X3
X4
X5
X6
X7
X8
X9
X10
X11
X12
Y
X13
X14
X15
... Xm
Task: Determine a function, f (or parameters to a function) such that f(X) = Y
Supervised Learning
Original Data New Data?
Does the
model hold up?
Model
How to Evaluate?
Training Data Testing Data
Model
Does the
model hold up?
How to Evaluate?
Training
Data
Testing Data
Model
Develop-
ment
Data
Model
Set training
hyperparameters
Does the
model hold up?
ML: GOAL
Goal: Decent estimate of model accuracy
train test
dev
All data
train test
dev train
train test
dev train
...
Iter 1
Iter 2
Iter 3
….
N-Fold Cross Validation
1
1
0
0
1
X = Y
0.5 0 0.6 1 0 0.25
0 0.5 0.3 0 0 0
0 0 1 1 1 0.5
0 0 0 0 1 1
0.25 1 1.25 1 0.1 2
Logistic Regression - Regularization
1
1
0
0
1
X = Y
0.5 0 0.6 1 0 0.25
0 0.5 0.3 0 0 0
0 0 1 1 1 0.5
0 0 0 0 1 1
0.25 1 1.25 1 0.1 2
Logistic Regression - Regularization
1
1
0
0
1
X = Y
0.5 0 0.6 1 0 0.25
0 0.5 0.3 0 0 0
0 0 1 1 1 0.5
0 0 0 0 1 1
0.25 1 1.25 1 0.1 2
1.2 + -63*x1
+ 179*x2
+ 71*x3
+ 18*x4
+ -59*x5
+ 19*x6
= logit(Y)
Logistic Regression - Regularization
1
1
0
0
1
X = Y
0.5 0 0.6 1 0 0.25
0 0.5 0.3 0 0 0
0 0 1 1 1 0.5
0 0 0 0 1 1
0.25 1 1.25 1 0.1 2
1.2 + -63*x1
+ 179*x2
+ 71*x3
+ 18*x4
+ -59*x5
+ 19*x6
= logit(Y)
“overfitting”
colab
Logistic Regression - Regularization
Overfitting (1-d example)
Underfit
(image credit: Scikit-learn; in practice data are rarely this clear)
Overfitting (1-d example)
Underfit Overfit
(image credit: Scikit-learn; in practice data are rarely this clear)
Overfitting (1-d example)
1
1
0
0
1
X = Y
0.5 0 0.6 1 0 0.25
0 0.5 0.3 0 0 0
0 0 1 1 1 0.5
0 0 0 0 1 1
0.25 1 1.25 1 0.1 2
1.2 + -63*x1
+ 179*x2
+ 71*x3
+ 18*x4
+ -59*x5
+ 19*x6
= logit(Y)
“overfitting”
Overfitting (multidimenstional example)
1
1
0
0
1
X = Y
0.5 0 0.6 1 0 0.25
0 0.5 0.3 0 0 0
0 0 1 1 1 0.5
0 0 0 0 1 1
0.25 1 1.25 1 0.1 2
1.2 + -63*x1
+ 179*x2
+ 71*x3
+ 18*x4
+ -59*x5
+ 19*x6
= logit(Y)
“overfitting”
“overfitting”: generally
due to trying to fit too
many features given the
number of observations.
Overfitting (multidimenstional example)
Logistic Regression - Regularization
1
1
0
0
1
X = Y
0.5 0
0 0.5
0 0
0 0
0.25 1
What if only 2
predictors?
x1
x2
Overfitting (multidimenstional example)
Logistic Regression - Regularization
1
1
0
0
1
X = Y
0.5 0
0 0.5
0 0
0 0
0.25 1
x1
x2
Overfitting (multidimenstional example)
0 + 2*x1
+ 2*x2
= logit(Y)
What if only 2
predictors?
A: better fit
L1 Regularization - “The Lasso”
Zeros out features by adding values that keep from perfectly fitting the data.
Logistic Regression - Regularization
L1 Regularization - “The Lasso”
Zeros out features by adding values that keep from perfectly fitting the data.
set betas that minimize the loss (J)
Logistic Regression - Regularization
L1 Regularization - “The Lasso”
Zeros out features by adding values that keep from perfectly fitting the data.
set betas that minimize the penalized loss (J)
Logistic Regression - Regularization
L1 Regularization - “The Lasso”
Zeros out features by adding values that keep from perfectly fitting the data.
set betas that minimize the L1 penalized loss (J)
Sometimes written as:
Logistic Regression - Regularization
Solved via gradient descent
L2 Regularization - “The Lasso”
Shrinks features by adding values that keep from perfectly fitting the data.
set betas that minimize the L2 penalized loss (J)
Logistic Regression - Regularization
Solved via gradient descent
L2 Regularization - “The Lasso”
Shrinks features by adding values that keep from perfectly fitting the data.
set betas that minimize the L2 penalized loss (J)
Sometimes written as:
Logistic Regression - Regularization
Solved via gradient descent
L2 Regularization - “The Lasso”
Shrinks features by adding values that keep from perfectly fitting the data.
Linear Regression - Regularization
L1 Regularization - “The Lasso”
Zeros out features by adding values that
keep from perfectly fitting the data.
L2 Regularization - “The Lasso”
Shrinks features by adding values that keep from perfectly fitting the data.
Linear Regression - Regularization
L1 Regularization - “The Lasso”
Zeros out features by adding values that
keep from perfectly fitting the data.
L2 Regularization - “The Lasso”
Shrinks features by adding values that keep from perfectly fitting the data.
Linear Regression - Regularization
L1 Regularization - “The Lasso”
Zeros out features by adding values that
keep from perfectly fitting the data.
mean squared error: same as rss but scales
better relative to the penalty lambda
L2 Regularization - “The Lasso”
Shrinks features by adding values that keep from perfectly fitting the data.
Linear Regression - Regularization
L1 Regularization - “The Lasso”
Zeros out features by adding values that
keep from perfectly fitting the data.
Machine Learning Goal: Generalize to new data
Training Data
Testing Data
Model Does the
model hold up?
X Y
80%
20%
Machine Learning Goal: Generalize to new data
Training Data
Testing Data
Model
Does the
model hold up?
X Y
70%
10%
20%
Development
Set
penalty
Logistic regression: For classification -- when y is binary.
Solving for L1 or L2 regularized loss:
(a) gradient descent, (b) reweighted least squares, (c) coordinate descent
Linear regression: For regression-- when y is continuous.
Solving for L1 regularized loss: gradient descent.
Solving for L2 regularized loss: gradient descent OR normal equation:
Linear / Logistic Regression - Regularization
When L2, L1 work well?
k << N, k < N k == N k > N k >> N
L2 Sometimes Often Often Sometimes Almost Never
L1 Almost Never Sometimes Often Sometimes Almost Never
Depends on data, but generally we find…
k: number of features; N: number of observations
1. Testing the relationship between variables given other
variables. 𝛽 is an “effect size” -- a score for the magnitude
of the relationship; can be tested for significance.
2. Building a predictive model that generalizes to new data.
Ŷ is an estimate value of Y given X.
Linear and Logistic Regression Uses:
1. Testing the relationship between variables given other
variables. 𝛽 is an “effect size” -- a score for the magnitude
of the relationship; can be tested for significance.
2. Building a predictive model that generalizes to new data.
Ŷ is an estimate value of Y given X.
However, unless |X| << observatations then the model
might “overfit”.
-> Regularized linear regression
Linear and Logistic Regression Uses:

More Related Content

Similar to CSE357 fa21 (6) Linear Machine Learning11-11.pdf

Similar to CSE357 fa21 (6) Linear Machine Learning11-11.pdf (20)

Computer Vision - Alignment and Tracking.pptx
Computer Vision - Alignment and Tracking.pptxComputer Vision - Alignment and Tracking.pptx
Computer Vision - Alignment and Tracking.pptx
 
Logistics regression
Logistics regressionLogistics regression
Logistics regression
 
Logistic Ordinal Regression
Logistic Ordinal RegressionLogistic Ordinal Regression
Logistic Ordinal Regression
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Logistic Ordinal Regression
Logistic Ordinal RegressionLogistic Ordinal Regression
Logistic Ordinal Regression
 
Ot regularization and_gradient_descent
Ot regularization and_gradient_descentOt regularization and_gradient_descent
Ot regularization and_gradient_descent
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function
 
Gradient Descent, Back Propagation, and Auto Differentiation - Advanced Spark...
Gradient Descent, Back Propagation, and Auto Differentiation - Advanced Spark...Gradient Descent, Back Propagation, and Auto Differentiation - Advanced Spark...
Gradient Descent, Back Propagation, and Auto Differentiation - Advanced Spark...
 
Hands-on Tutorial of Machine Learning in Python
Hands-on Tutorial of Machine Learning in PythonHands-on Tutorial of Machine Learning in Python
Hands-on Tutorial of Machine Learning in Python
 
Bill howe 7_machinelearning_2
Bill howe 7_machinelearning_2Bill howe 7_machinelearning_2
Bill howe 7_machinelearning_2
 
ML - Multiple Linear Regression
ML - Multiple Linear RegressionML - Multiple Linear Regression
ML - Multiple Linear Regression
 
MLHEP Lectures - day 2, basic track
MLHEP Lectures - day 2, basic trackMLHEP Lectures - day 2, basic track
MLHEP Lectures - day 2, basic track
 
Optimization
OptimizationOptimization
Optimization
 
3ml.pdf
3ml.pdf3ml.pdf
3ml.pdf
 
MLHEP 2015: Introductory Lecture #1
MLHEP 2015: Introductory Lecture #1MLHEP 2015: Introductory Lecture #1
MLHEP 2015: Introductory Lecture #1
 
LR vs LDA
LR vs LDALR vs LDA
LR vs LDA
 
GlobalLogic Machine Learning Webinar “Statistical learning of linear regressi...
GlobalLogic Machine Learning Webinar “Statistical learning of linear regressi...GlobalLogic Machine Learning Webinar “Statistical learning of linear regressi...
GlobalLogic Machine Learning Webinar “Statistical learning of linear regressi...
 
Explore ml day 2
Explore ml day 2Explore ml day 2
Explore ml day 2
 
logistic regression.pdf
logistic regression.pdflogistic regression.pdf
logistic regression.pdf
 
lecture15-regularization.pptx
lecture15-regularization.pptxlecture15-regularization.pptx
lecture15-regularization.pptx
 

Recently uploaded

Call Girls in Uttam Nagar (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in  Uttam Nagar (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in  Uttam Nagar (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in Uttam Nagar (delhi) call me [🔝9953056974🔝] escort service 24X7
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Recently uploaded (20)

On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
 
Call Girls in Uttam Nagar (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in  Uttam Nagar (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in  Uttam Nagar (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in Uttam Nagar (delhi) call me [🔝9953056974🔝] escort service 24X7
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptx
 
AIM of Education-Teachers Training-2024.ppt
AIM of Education-Teachers Training-2024.pptAIM of Education-Teachers Training-2024.ppt
AIM of Education-Teachers Training-2024.ppt
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 
Basic Intentional Injuries Health Education
Basic Intentional Injuries Health EducationBasic Intentional Injuries Health Education
Basic Intentional Injuries Health Education
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
How to Manage Call for Tendor in Odoo 17
How to Manage Call for Tendor in Odoo 17How to Manage Call for Tendor in Odoo 17
How to Manage Call for Tendor in Odoo 17
 
Tatlong Kwento ni Lola basyang-1.pdf arts
Tatlong Kwento ni Lola basyang-1.pdf artsTatlong Kwento ni Lola basyang-1.pdf arts
Tatlong Kwento ni Lola basyang-1.pdf arts
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxOn_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
latest AZ-104 Exam Questions and Answers
latest AZ-104 Exam Questions and Answerslatest AZ-104 Exam Questions and Answers
latest AZ-104 Exam Questions and Answers
 
How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 
21st_Century_Skills_Framework_Final_Presentation_2.pptx
21st_Century_Skills_Framework_Final_Presentation_2.pptx21st_Century_Skills_Framework_Final_Presentation_2.pptx
21st_Century_Skills_Framework_Final_Presentation_2.pptx
 

CSE357 fa21 (6) Linear Machine Learning11-11.pdf

  • 1. From Linear Modeling to Machine Learning Statistics for Data Science CSE357 - Fall 2021
  • 2. ● Ridge Regression (L2 Penalized) ● Lasso Regression (L1 Penalized) ● Feature Selection ● Accuracy Metrics Linear Models
  • 3. Finding a linear function based on X to best yield Y. X = “covariate” = “feature” = “predictor” = “regressor” = “independent variable” Y = “response variable” = “outcome” = “dependent variable” Linear Regression
  • 4. Finding a linear function based on X to best yield Y. X = “covariate” = “feature” = “predictor” = “regressor” = “independent variable” Y = “response variable” = “outcome” = “dependent variable” Regression: goal: estimate the function r Linear Regression
  • 5. Finding a linear function based on X to best yield Y. X = “covariate” = “feature” = “predictor” = “regressor” = “independent variable” Y = “response variable” = “outcome” = “dependent variable” Regression: goal: estimate the function r The expected value of Y, given that the random variable X is equal to some specific value, x. Linear Regression
  • 6. Finding a linear function based on X to best yield Y. X = “covariate” = “feature” = “predictor” = “regressor” = “independent variable” Y = “response variable” = “outcome” = “dependent variable” Regression: goal: estimate the function r Linear Regression (univariate version): goal: find 𝛽0 , 𝛽1 such that Linear Regression
  • 7. Review: Multiple Linear Regression Residual: Direct Estimates (Normal Equation) y
  • 8. What if Yi ∊ {0, 1}? (i.e. we want “classification”) P(Yi = 0 | X = x) Thus, 0 is class 0 and 1 is class 1. Review: Logistic Regression
  • 9. Often we want to make a classification based on multiple features: Logistic Regression with Multiple Feats Y-axis is Y (i.e. 1 or 0)
  • 10. Often we want to make a classification based on multiple features: Y-axis is Y (i.e. 1 or 0) To make room for multiple Xs, let’s get rid of y-axis. Instead, show decision point. Logistic Regression with Multiple Feats
  • 11. Often we want to make a classification based on multiple features: Logistic Regression with Multiple Feats To make room for multiple Xs, let’s get rid of y-axis. Instead, show decision point. decision line decision hyperplane
  • 12. What if Yi ∊ {0, 1}? (i.e. we want “classification”) Review: Logistic Regression We’re learning a linear (i.e. flat) separating hyperplane, but fitting it to a logit outcome. (https://www.linkedin.com/pulse/predicting-outcomes-pr obabilities-logistic-regression-konstantinidis/)
  • 13. What if Yi ∊ {0, 1}? (i.e. we want “classification”) To estimate , one can use reweighted least squares: (Wasserman, 2005; Li, 2010) Logistic Regression: Faster Approach
  • 17. X1 X2 X3 Y X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 Y X13 X14 X15 ... Xm Task: Determine a function, f (or parameters to a function) such that f(X) = Y Supervised Machine Learning
  • 18. X1 X2 X3 Y X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 Y X13 X14 X15 ... Xm Task: Determine a function, f (or parameters to a function) such that f(X) = Y Supervised Learning
  • 19. Original Data New Data? Does the model hold up? Model How to Evaluate?
  • 20. Training Data Testing Data Model Does the model hold up? How to Evaluate?
  • 22. Goal: Decent estimate of model accuracy train test dev All data train test dev train train test dev train ... Iter 1 Iter 2 Iter 3 …. N-Fold Cross Validation
  • 23. 1 1 0 0 1 X = Y 0.5 0 0.6 1 0 0.25 0 0.5 0.3 0 0 0 0 0 1 1 1 0.5 0 0 0 0 1 1 0.25 1 1.25 1 0.1 2 Logistic Regression - Regularization
  • 24. 1 1 0 0 1 X = Y 0.5 0 0.6 1 0 0.25 0 0.5 0.3 0 0 0 0 0 1 1 1 0.5 0 0 0 0 1 1 0.25 1 1.25 1 0.1 2 Logistic Regression - Regularization
  • 25. 1 1 0 0 1 X = Y 0.5 0 0.6 1 0 0.25 0 0.5 0.3 0 0 0 0 0 1 1 1 0.5 0 0 0 0 1 1 0.25 1 1.25 1 0.1 2 1.2 + -63*x1 + 179*x2 + 71*x3 + 18*x4 + -59*x5 + 19*x6 = logit(Y) Logistic Regression - Regularization
  • 26. 1 1 0 0 1 X = Y 0.5 0 0.6 1 0 0.25 0 0.5 0.3 0 0 0 0 0 1 1 1 0.5 0 0 0 0 1 1 0.25 1 1.25 1 0.1 2 1.2 + -63*x1 + 179*x2 + 71*x3 + 18*x4 + -59*x5 + 19*x6 = logit(Y) “overfitting” colab Logistic Regression - Regularization
  • 28. Underfit (image credit: Scikit-learn; in practice data are rarely this clear) Overfitting (1-d example)
  • 29. Underfit Overfit (image credit: Scikit-learn; in practice data are rarely this clear) Overfitting (1-d example)
  • 30. 1 1 0 0 1 X = Y 0.5 0 0.6 1 0 0.25 0 0.5 0.3 0 0 0 0 0 1 1 1 0.5 0 0 0 0 1 1 0.25 1 1.25 1 0.1 2 1.2 + -63*x1 + 179*x2 + 71*x3 + 18*x4 + -59*x5 + 19*x6 = logit(Y) “overfitting” Overfitting (multidimenstional example)
  • 31. 1 1 0 0 1 X = Y 0.5 0 0.6 1 0 0.25 0 0.5 0.3 0 0 0 0 0 1 1 1 0.5 0 0 0 0 1 1 0.25 1 1.25 1 0.1 2 1.2 + -63*x1 + 179*x2 + 71*x3 + 18*x4 + -59*x5 + 19*x6 = logit(Y) “overfitting” “overfitting”: generally due to trying to fit too many features given the number of observations. Overfitting (multidimenstional example)
  • 32. Logistic Regression - Regularization 1 1 0 0 1 X = Y 0.5 0 0 0.5 0 0 0 0 0.25 1 What if only 2 predictors? x1 x2 Overfitting (multidimenstional example)
  • 33. Logistic Regression - Regularization 1 1 0 0 1 X = Y 0.5 0 0 0.5 0 0 0 0 0.25 1 x1 x2 Overfitting (multidimenstional example) 0 + 2*x1 + 2*x2 = logit(Y) What if only 2 predictors? A: better fit
  • 34. L1 Regularization - “The Lasso” Zeros out features by adding values that keep from perfectly fitting the data. Logistic Regression - Regularization
  • 35. L1 Regularization - “The Lasso” Zeros out features by adding values that keep from perfectly fitting the data. set betas that minimize the loss (J) Logistic Regression - Regularization
  • 36. L1 Regularization - “The Lasso” Zeros out features by adding values that keep from perfectly fitting the data. set betas that minimize the penalized loss (J) Logistic Regression - Regularization
  • 37. L1 Regularization - “The Lasso” Zeros out features by adding values that keep from perfectly fitting the data. set betas that minimize the L1 penalized loss (J) Sometimes written as: Logistic Regression - Regularization Solved via gradient descent
  • 38. L2 Regularization - “The Lasso” Shrinks features by adding values that keep from perfectly fitting the data. set betas that minimize the L2 penalized loss (J) Logistic Regression - Regularization Solved via gradient descent
  • 39. L2 Regularization - “The Lasso” Shrinks features by adding values that keep from perfectly fitting the data. set betas that minimize the L2 penalized loss (J) Sometimes written as: Logistic Regression - Regularization Solved via gradient descent
  • 40. L2 Regularization - “The Lasso” Shrinks features by adding values that keep from perfectly fitting the data. Linear Regression - Regularization L1 Regularization - “The Lasso” Zeros out features by adding values that keep from perfectly fitting the data.
  • 41. L2 Regularization - “The Lasso” Shrinks features by adding values that keep from perfectly fitting the data. Linear Regression - Regularization L1 Regularization - “The Lasso” Zeros out features by adding values that keep from perfectly fitting the data.
  • 42. L2 Regularization - “The Lasso” Shrinks features by adding values that keep from perfectly fitting the data. Linear Regression - Regularization L1 Regularization - “The Lasso” Zeros out features by adding values that keep from perfectly fitting the data. mean squared error: same as rss but scales better relative to the penalty lambda
  • 43. L2 Regularization - “The Lasso” Shrinks features by adding values that keep from perfectly fitting the data. Linear Regression - Regularization L1 Regularization - “The Lasso” Zeros out features by adding values that keep from perfectly fitting the data.
  • 44. Machine Learning Goal: Generalize to new data Training Data Testing Data Model Does the model hold up? X Y 80% 20%
  • 45. Machine Learning Goal: Generalize to new data Training Data Testing Data Model Does the model hold up? X Y 70% 10% 20% Development Set penalty
  • 46. Logistic regression: For classification -- when y is binary. Solving for L1 or L2 regularized loss: (a) gradient descent, (b) reweighted least squares, (c) coordinate descent Linear regression: For regression-- when y is continuous. Solving for L1 regularized loss: gradient descent. Solving for L2 regularized loss: gradient descent OR normal equation: Linear / Logistic Regression - Regularization
  • 47. When L2, L1 work well? k << N, k < N k == N k > N k >> N L2 Sometimes Often Often Sometimes Almost Never L1 Almost Never Sometimes Often Sometimes Almost Never Depends on data, but generally we find… k: number of features; N: number of observations
  • 48. 1. Testing the relationship between variables given other variables. 𝛽 is an “effect size” -- a score for the magnitude of the relationship; can be tested for significance. 2. Building a predictive model that generalizes to new data. Ŷ is an estimate value of Y given X. Linear and Logistic Regression Uses:
  • 49. 1. Testing the relationship between variables given other variables. 𝛽 is an “effect size” -- a score for the magnitude of the relationship; can be tested for significance. 2. Building a predictive model that generalizes to new data. Ŷ is an estimate value of Y given X. However, unless |X| << observatations then the model might “overfit”. -> Regularized linear regression Linear and Logistic Regression Uses: