SlideShare a Scribd company logo
1 of 27
Download to read offline
Mathematics behind
Machine Learning:
Linear Regression
Model
Dr Lotfi Ncib, Associate Professor Of applied mathematics Esprit School of Engineering
Disclaimer: Some of the Images and content have been taken from multiple online sources and this presentation is intended only for knowledge sharing but not
for any commercial business intention
1
What is The difference between AI, ML and DL?
• Artificial Intelligence AI tries to make computers intelligent in order to mimic
the cognitive functions of humans. So, AI is a general field with a broad scope
including:
• Computer Vision,
• Language Processing,
• Creativity…
• Machine Learning ML is the branch of AI that covers the statistical part of
artificial intelligence. It teaches the computer to solve problems by looking at
hundreds or thousands of examples, learning from them, and then using that
experience to solve the same problem in new situations:
• Regression,
• Classification,
• Clustering…
• DL is a very special field of Machine Learning where computers can actually
learn and make intelligent decisions on their own,
• CNN
• RNN…
2
Types of Machine Learning
3
Classical Machine Learning
F
4
What is Regression?
Size (feet2) Number of
bedrooms
Number of
floors
Age of home
(years) Price ($1000)
2104 5 1 45 460
1416 3 2 40 232
1534 3 2 30 315
852 2 1 36 178
1510 3 2 30 ?
Regression is the
process of predicting
a continuous value.
X: Independent variable Y: dependent variable
Continuousvariable
Regression is Supervised: Target is provided
5
Types of Regression
• Simple Regression
• Simple Linear Regression
• Simple Non-Linear Regression.
Predict Price($1000) vs Size(feet2) of all houses
• Multiple Regression
• Multiple Linear Regression
• Multiple Non-Linear Regression.
Predict Price($1000) vs Size(feet2) and number of
bedrooms
Types of Regression
Simple
Linear Non-Linear
Multiple
Linear Non-Linear
One Variable 2+ Variables
6
Applications of Regression
• Price estimation of house:
• size, number of bedrooms, and so on.
• Employment income:
• hours of work, education, occupation, sex age, years of
experience, and so on.
Indeed you can find many examples of the usefulness of regression
analysis in these and many other fields, or domains such as finance,
healthcare, retail, and more.
7
Exemple of Regression algorithms
We have many regression algorithms:
• Ordinal regression
• Poisson regression
• Fast forest quantile regression
• Linear, polynomial, Lasso, Stepwise, Ridge regression
• Bayesian linear regression
• Neural network regression
• Decision forest regression
• KNN
• Boosted decision tree regression
8
Simple Linear
regression
Model
representation
9
Simple Linear Regression
• Simple linear regression
• Predict Price($1000) vs Size(feet2) of all houses
• Independent variable (x): Size of house
• Dependent variable (y): Price of house
Size in feet2 (x) Price ($) in 1000 (y)
2104 460
1416 232
1534 315
852 178
1245 ?
Notation:
m = Number of training examples
x = “input” variable / features
y = “output” variable / “target” variable
10
Training Set
Learning Algorithm
h
Size of
house
Estimated
price
hypothesis Linear regression with one variable.
Univariate linear regression.
Model representation
ℎ 𝜃 𝑥 = 𝜃0 + 𝜃1 𝑥
Choice of ℎ ?
11
Cost function
Training Set Size in feet2 (x) Price ($) in 1000 (y)
2104 460
1416 232
1534 315
852 178
Goal: Find regression line that makes
sum of residuals as small as possible
ℎ 𝜃 𝑥 = 𝜃0 + 𝜃1 𝑥
Hypothesis :
𝜃0, 𝜃1Parameters :
12
Cost function
Idea: Choose 𝜃0, 𝜃1 so that ℎ 𝜃 is close to 𝑦 for our training samples
𝐽 𝜃0, 𝜃1 =
1
2𝑚
෍
𝑖=1
𝑚
(ℎ 𝜃(𝑥 𝑖 ) − 𝑦 𝑖 )2
𝜃0, 𝜃1
ℎ 𝜃 𝑥 = 𝜃0 + 𝜃1 𝑥Hypothesis :
Parameters :
Cost function :
min
𝜃0,𝜃1
𝐽 𝜃0, 𝜃1Goal :
13
Analytical Solution
the vectorization expression of linear regression cost function can be denoted as:
𝑋 =
1 𝑥(1)
⋮ ⋮
1 𝑥(𝑚)
𝜃 =
𝜃0
𝜃1
𝐽 𝜃 =
1
2𝑚
𝑋𝜃 − 𝑦 𝑇
(𝑋𝜃 − 𝑦)
𝐽 𝜃 = 𝑋𝜃 − 𝑦 𝑇(𝑋𝜃 − 𝑦)
𝐽 𝜃 = ( 𝑋𝜃 𝑇
− 𝑦 𝑇
)(𝑋𝜃 − 𝑦)
Since
1
2𝑚
is a constant, we omit this constant term. Then our cost function becomes:
𝑦 =
𝑦(1)
⋮
𝑦(𝑚)
This can be further simplified as:
We expand it to obtain: 𝐽 𝜃 = 𝑋𝜃 𝑇 𝑋𝜃 − 𝑋𝜃 𝑇 𝑦 − 𝑦 𝑇 𝑋𝜃 + 𝑦 𝑇 𝑦
Cost function: 𝐽 𝜃0, 𝜃1 =
1
2𝑚
෍
𝑖=1
𝑚
(ℎ 𝜃(𝑥 𝑖
) − 𝑦 𝑖
)2
Or ( 𝑋𝜃 𝑇
𝑦) 𝑇
= 𝑦 𝑇
(𝑋𝜃) Then 𝐽 𝜃 = 𝑋𝜃 𝑇 𝑋𝜃 − 2𝑦 𝑇 𝑋𝜃 + 𝑦 𝑇 𝑦
14
Further more, we can write it as: 𝐽 𝜃 = 𝜃 𝑇
𝑋 𝑇
𝑋𝜃 − 2𝑦 𝑇
𝑋𝜃 + 𝑦 𝑇
𝑦
Now we need to take derivative of the cost function. For convenience, the common matrix derivative
formulas are listed as reference:
𝜕𝐴𝑋
𝜕𝑋
= 𝐴,
𝜕𝑋 𝑇 𝐴
𝜕𝑋
= 𝐴,
𝜕𝑋 𝑇 𝑋
𝜕𝑋
= 2𝑋,
𝜕𝑋 𝑇 𝐴𝑋
𝜕𝑋
= 𝐴𝑋 + 𝐴 𝑇 𝑋
Using the above formulas, we can derive our cost function respect to 𝜃 as:
𝜕𝐽 𝜃
𝜕𝜃
= 2𝑋 𝑇
𝑋𝜃 − 2𝑋 𝑇
𝑦
In order to solve the variables, we need to make the above derivation equal to zero, that is:
2𝑋 𝑇
𝑋𝜃 − 2𝑋 𝑇
𝑦 = 0 then 𝑋 𝑇
𝑋𝜃 = 𝑋 𝑇
𝑦
Thus we can compute θ as: 𝜃 = (𝑋 𝑇
𝑋)−1
𝑋 𝑇
𝑦
Analytical Solution
- What if 𝑋 𝑇 𝑋 is non-invertible? (singular/ degenerate)
15
Gradient descent
Have some function
Want
Outline:
• Start with some
• Keep changing to reduce
until we hopefully end up at a minimum
16
Gradient descent algorithm
Correct: Simultaneous update Incorrect:
Gradient descent
17
Gradient descent
Gradient descent algorithm Linear Regression Model
update
and
simultaneously
18
Multiple Linear
regression
Model
representation
19
Size (feet2) Number of
bedrooms
Number of
floors
Age of home
(years) Price ($1000)
2104 5 1 45 460
1416 3 2 40 232
1534 3 2 30 315
852 2 1 36 178
1510 3 2 30 ?
Notation:
m = Number of training examples
n = Number of features(variables)
𝑥(𝑖)
= “input” of the 𝑖 𝑡ℎ
training example
𝑥𝑗
(𝑖)
= value of feature 𝑗 in 𝑖 𝑡ℎ training example
Model representation
20
Training Set
Learning Algorithm
h
Size of house,
Number of bedrooms,
Numbers of floors,
Age of home
Estimated
price
hypothesis
Choice of h ?
Model representation
ℎ 𝜃 𝑥 = 𝜃0 + 𝜃1 𝑥
ℎ 𝜃 𝑥 = 𝜃0 + 𝜃1 𝑥1 + 𝜃2 𝑥2 + 𝜃3 𝑥3 + 𝜃4 𝑥4
21
Model representation
ℎ 𝜃 𝑥(𝑖)
= 𝜃0 + 𝜃1 𝑥1
(𝑖)
+ 𝜃2 𝑥2
(𝑖)
+ 𝜃3 𝑥3
(𝑖)
+ 𝜃4 𝑥4
(𝑖)
For convenience of notation, define 𝑥0
(𝑖)
= 1
𝑥(𝑖) =
𝑥0
(𝑖)
𝑥1
(𝑖)
𝑥2
(𝑖)
⋮
𝑥 𝑛
(𝑖)
𝜖ℝ 𝑛+1, 𝜃 =
𝜃0
𝜃1
𝜃2
⋮
𝜃 𝑛
𝜖ℝ 𝑛+1
Multivariate Linear regression
Hypothesis :
= 𝜃 𝑇
𝑥(𝑖)
ℎ 𝜃 𝑥(𝑖) = 𝜃0 + 𝜃1 𝑥1
(𝑖)
+ 𝜃2 𝑥2
(𝑖)
+ 𝜃3 𝑥3
(𝑖)
+ 𝜃4 𝑥4
(𝑖)
22
Cost function
Idea: Choose 𝜃0, 𝜃1,… 𝜃 𝑛 so that ℎ 𝜃 is close to 𝑦 for our training samples
𝐽 𝜃0, 𝜃1,… 𝜃 𝑛 =
1
2𝑚
෍
𝑖=1
𝑚
(ℎ 𝜃(𝑥 𝑖
) − 𝑦 𝑖
)2
𝜃0, 𝜃1,… 𝜃 𝑛
Hypothesis :
Parameters :
Cost function :
min
𝜃0,𝜃1,… 𝜃 𝑛
𝐽 𝜃0, 𝜃1,… 𝜃 𝑛Goal :
ℎ 𝜃 𝑥(𝑖) = 𝜃0 + 𝜃1 𝑥1
(𝑖)
+ 𝜃2 𝑥2
(𝑖)
+ 𝜃3 𝑥3
(𝑖)
+ 𝜃4 𝑥4
(𝑖)
In order to achieve the hypothesis for all the samples we use the following equation:
ℎ 𝜃 𝑥 = 𝑋𝜃 =
𝑥0
(1)
𝑥1
(1)
… 𝑥 𝑛
(1)
𝑥0
(2)
⋮
𝑥1
(2)
⋮
…
…
𝑥 𝑛
(2)
⋮
𝑥0
(𝑚)
𝑥1
(𝑚)
… 𝑥 𝑛
(𝑚)
𝜃0
𝜃1
⋮
𝜃 𝑛
23
Analytical Solution
the vectorization expression of linear regression cost function can be denoted as:
𝑋 =
𝑥0
(1)
𝑥1
(1)
… 𝑥 𝑛
(1)
𝑥0
(2)
⋮
𝑥1
(2)
⋮
…
…
𝑥 𝑛
(2)
⋮
𝑥0
(𝑚)
𝑥1
(𝑚)
… 𝑥 𝑛
(𝑚)
𝜃 =
𝜃0
𝜃1
⋮
𝜃 𝑛
𝐽 𝜃 =
1
2𝑚
𝑋𝜃 − 𝑦 𝑇(𝑋𝜃 − 𝑦)
𝑦 =
𝑦(1)
𝑦(2)
⋮
𝑦(𝑚)
Cost function: 𝐽 𝜃0, 𝜃1,… 𝜃 𝑛 =
1
2𝑚
෍
𝑖=1
𝑚
(ℎ 𝜃(𝑥 𝑖 ) − 𝑦 𝑖 )2
Thus we can compute 𝜃 as: 𝜃 = (𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝑦
- What if 𝑋 𝑇 𝑋 is non-invertible? (singular/ degenerate)
24
Gradient Descent
Repeat
Previously (n=1):
New algorithm :
Repeat
Gradient descent
25
E.g. 𝑥1= size (0-2000 feet2)
𝑥2 = number of bedrooms (1-5)
Feature Scaling
Idea: Make sure features are on a similar scale.
Replace 𝑥𝑖 with 𝑥𝑖 − 𝜇𝑖 to make features have approximately zero mean
(Do not apply to 𝑥0 = 1 ).
Mean normalization
E.g.
Gradient descent in practice : Feature Scaling
26
Gradient descent in practice : Feature Scaling
Gradient descent
- “Debugging”: How to make sure gradient
descent is working correctly.
- How to choose learning rate .
- If is too small: slow convergence.
- If is too large: may not decrease on
every iteration; may not converge.
To choose , try
Summary:

More Related Content

What's hot

Linear Programming Feasible Region
Linear Programming Feasible RegionLinear Programming Feasible Region
Linear Programming Feasible RegionVARUN MODI
 
Gr.2 N2.2
Gr.2 N2.2 Gr.2 N2.2
Gr.2 N2.2 susan70
 
Taylor introms10 ppt_02
Taylor introms10 ppt_02Taylor introms10 ppt_02
Taylor introms10 ppt_02QA Cmu
 
NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015Christian Robert
 
interpolation of unequal intervals
interpolation of unequal intervalsinterpolation of unequal intervals
interpolation of unequal intervalsvaani pathak
 
K-means, EM and Mixture models
K-means, EM and Mixture modelsK-means, EM and Mixture models
K-means, EM and Mixture modelsVu Pham
 
Chapter 1: Linear Regression
Chapter 1: Linear RegressionChapter 1: Linear Regression
Chapter 1: Linear RegressionAkmelSyed
 
Linear Programming 1
Linear Programming 1Linear Programming 1
Linear Programming 1irsa javed
 
Regression_1.pdf
Regression_1.pdfRegression_1.pdf
Regression_1.pdfAmir Saleh
 
Applications of linear programming
Applications of linear programmingApplications of linear programming
Applications of linear programmingZenblade 93
 
A mid point ellipse drawing algorithm on a hexagonal grid
A mid  point ellipse drawing algorithm on a hexagonal gridA mid  point ellipse drawing algorithm on a hexagonal grid
A mid point ellipse drawing algorithm on a hexagonal gridS M K
 
Fisherfaces Face Recognition Algorithm
Fisherfaces Face Recognition AlgorithmFisherfaces Face Recognition Algorithm
Fisherfaces Face Recognition AlgorithmVishnu K N
 
LINEAR PROGRAMMING
LINEAR PROGRAMMINGLINEAR PROGRAMMING
LINEAR PROGRAMMINGrashi9
 

What's hot (20)

Linear Programming Feasible Region
Linear Programming Feasible RegionLinear Programming Feasible Region
Linear Programming Feasible Region
 
Gr.2 N2.2
Gr.2 N2.2 Gr.2 N2.2
Gr.2 N2.2
 
Distributed ADMM
Distributed ADMMDistributed ADMM
Distributed ADMM
 
Taylor introms10 ppt_02
Taylor introms10 ppt_02Taylor introms10 ppt_02
Taylor introms10 ppt_02
 
QMC: Operator Splitting Workshop, Incremental Learning-to-Learn with Statisti...
QMC: Operator Splitting Workshop, Incremental Learning-to-Learn with Statisti...QMC: Operator Splitting Workshop, Incremental Learning-to-Learn with Statisti...
QMC: Operator Splitting Workshop, Incremental Learning-to-Learn with Statisti...
 
NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015
 
interpolation of unequal intervals
interpolation of unequal intervalsinterpolation of unequal intervals
interpolation of unequal intervals
 
Lpp 2.1202.ppts
Lpp 2.1202.pptsLpp 2.1202.ppts
Lpp 2.1202.ppts
 
K-means, EM and Mixture models
K-means, EM and Mixture modelsK-means, EM and Mixture models
K-means, EM and Mixture models
 
Palash badal
Palash badalPalash badal
Palash badal
 
Chapter 1: Linear Regression
Chapter 1: Linear RegressionChapter 1: Linear Regression
Chapter 1: Linear Regression
 
Linear Programming 1
Linear Programming 1Linear Programming 1
Linear Programming 1
 
Linear programming
Linear programmingLinear programming
Linear programming
 
Interp lagrange
Interp lagrangeInterp lagrange
Interp lagrange
 
Regression_1.pdf
Regression_1.pdfRegression_1.pdf
Regression_1.pdf
 
Applications of linear programming
Applications of linear programmingApplications of linear programming
Applications of linear programming
 
A mid point ellipse drawing algorithm on a hexagonal grid
A mid  point ellipse drawing algorithm on a hexagonal gridA mid  point ellipse drawing algorithm on a hexagonal grid
A mid point ellipse drawing algorithm on a hexagonal grid
 
Chapter 3-2
Chapter 3-2Chapter 3-2
Chapter 3-2
 
Fisherfaces Face Recognition Algorithm
Fisherfaces Face Recognition AlgorithmFisherfaces Face Recognition Algorithm
Fisherfaces Face Recognition Algorithm
 
LINEAR PROGRAMMING
LINEAR PROGRAMMINGLINEAR PROGRAMMING
LINEAR PROGRAMMING
 

Similar to Regression

مدخل إلى تعلم الآلة
مدخل إلى تعلم الآلةمدخل إلى تعلم الآلة
مدخل إلى تعلم الآلةFares Al-Qunaieer
 
Computational Intelligence Assisted Engineering Design Optimization (using MA...
Computational Intelligence Assisted Engineering Design Optimization (using MA...Computational Intelligence Assisted Engineering Design Optimization (using MA...
Computational Intelligence Assisted Engineering Design Optimization (using MA...AmirParnianifard1
 
Machine learning introduction lecture notes
Machine learning introduction lecture notesMachine learning introduction lecture notes
Machine learning introduction lecture notesUmeshJagga1
 
Machine learning
Machine learningMachine learning
Machine learningShreyas G S
 
Support Vector Machines Simply
Support Vector Machines SimplySupport Vector Machines Simply
Support Vector Machines SimplyEmad Nabil
 
Deep Learning: Introduction & Chapter 5 Machine Learning Basics
Deep Learning: Introduction & Chapter 5 Machine Learning BasicsDeep Learning: Introduction & Chapter 5 Machine Learning Basics
Deep Learning: Introduction & Chapter 5 Machine Learning BasicsJason Tsai
 
L1 intro2 supervised_learning
L1 intro2 supervised_learningL1 intro2 supervised_learning
L1 intro2 supervised_learningYogendra Singh
 
Applied Algorithms and Structures week999
Applied Algorithms and Structures week999Applied Algorithms and Structures week999
Applied Algorithms and Structures week999fashiontrendzz20
 
Machine Learning lecture3(linear regression)
Machine Learning lecture3(linear regression)Machine Learning lecture3(linear regression)
Machine Learning lecture3(linear regression)cairo university
 
Machine Learning lecture2(linear regression)
Machine Learning lecture2(linear regression)Machine Learning lecture2(linear regression)
Machine Learning lecture2(linear regression)cairo university
 
Linear regression, costs & gradient descent
Linear regression, costs & gradient descentLinear regression, costs & gradient descent
Linear regression, costs & gradient descentRevanth Kumar
 
Unit-1 Basic Concept of Algorithm.pptx
Unit-1 Basic Concept of Algorithm.pptxUnit-1 Basic Concept of Algorithm.pptx
Unit-1 Basic Concept of Algorithm.pptxssuser01e301
 
Paper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelinePaper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelineChenYiHuang5
 
Koh_Liang_ICML2017
Koh_Liang_ICML2017Koh_Liang_ICML2017
Koh_Liang_ICML2017Masa Kato
 

Similar to Regression (20)

Session 4 .pdf
Session 4 .pdfSession 4 .pdf
Session 4 .pdf
 
مدخل إلى تعلم الآلة
مدخل إلى تعلم الآلةمدخل إلى تعلم الآلة
مدخل إلى تعلم الآلة
 
Computational Intelligence Assisted Engineering Design Optimization (using MA...
Computational Intelligence Assisted Engineering Design Optimization (using MA...Computational Intelligence Assisted Engineering Design Optimization (using MA...
Computational Intelligence Assisted Engineering Design Optimization (using MA...
 
Machine learning introduction lecture notes
Machine learning introduction lecture notesMachine learning introduction lecture notes
Machine learning introduction lecture notes
 
Machine learning
Machine learningMachine learning
Machine learning
 
Support Vector Machines Simply
Support Vector Machines SimplySupport Vector Machines Simply
Support Vector Machines Simply
 
Deep Learning: Introduction & Chapter 5 Machine Learning Basics
Deep Learning: Introduction & Chapter 5 Machine Learning BasicsDeep Learning: Introduction & Chapter 5 Machine Learning Basics
Deep Learning: Introduction & Chapter 5 Machine Learning Basics
 
Lecture 11 linear regression
Lecture 11 linear regressionLecture 11 linear regression
Lecture 11 linear regression
 
L1 intro2 supervised_learning
L1 intro2 supervised_learningL1 intro2 supervised_learning
L1 intro2 supervised_learning
 
Applied Algorithms and Structures week999
Applied Algorithms and Structures week999Applied Algorithms and Structures week999
Applied Algorithms and Structures week999
 
Machine Learning lecture3(linear regression)
Machine Learning lecture3(linear regression)Machine Learning lecture3(linear regression)
Machine Learning lecture3(linear regression)
 
Machine Learning lecture2(linear regression)
Machine Learning lecture2(linear regression)Machine Learning lecture2(linear regression)
Machine Learning lecture2(linear regression)
 
3ml.pdf
3ml.pdf3ml.pdf
3ml.pdf
 
Explore ml day 2
Explore ml day 2Explore ml day 2
Explore ml day 2
 
Linear regression, costs & gradient descent
Linear regression, costs & gradient descentLinear regression, costs & gradient descent
Linear regression, costs & gradient descent
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Unit-1 Basic Concept of Algorithm.pptx
Unit-1 Basic Concept of Algorithm.pptxUnit-1 Basic Concept of Algorithm.pptx
Unit-1 Basic Concept of Algorithm.pptx
 
Paper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelinePaper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipeline
 
Koh_Liang_ICML2017
Koh_Liang_ICML2017Koh_Liang_ICML2017
Koh_Liang_ICML2017
 
Ml ppt at
Ml ppt atMl ppt at
Ml ppt at
 

More from Ncib Lotfi

Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceNcib Lotfi
 
Introduction: Intelligence Artificielle, Machine Learning et Deep Learning
Introduction: Intelligence Artificielle, Machine Learning et Deep LearningIntroduction: Intelligence Artificielle, Machine Learning et Deep Learning
Introduction: Intelligence Artificielle, Machine Learning et Deep LearningNcib Lotfi
 
Cheat sheets for AI
Cheat sheets for AICheat sheets for AI
Cheat sheets for AINcib Lotfi
 
ARTIFICIAL INTELLIGENCE & MACHINE LEARNING CAREER GUIDE
ARTIFICIAL INTELLIGENCE & MACHINE LEARNING CAREER GUIDEARTIFICIAL INTELLIGENCE & MACHINE LEARNING CAREER GUIDE
ARTIFICIAL INTELLIGENCE & MACHINE LEARNING CAREER GUIDENcib Lotfi
 
Decision trees
Decision treesDecision trees
Decision treesNcib Lotfi
 

More from Ncib Lotfi (10)

Auto eda
Auto edaAuto eda
Auto eda
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Introduction: Intelligence Artificielle, Machine Learning et Deep Learning
Introduction: Intelligence Artificielle, Machine Learning et Deep LearningIntroduction: Intelligence Artificielle, Machine Learning et Deep Learning
Introduction: Intelligence Artificielle, Machine Learning et Deep Learning
 
Resume
ResumeResume
Resume
 
Rapport stage
Rapport stageRapport stage
Rapport stage
 
Cheat sheets for AI
Cheat sheets for AICheat sheets for AI
Cheat sheets for AI
 
ARTIFICIAL INTELLIGENCE & MACHINE LEARNING CAREER GUIDE
ARTIFICIAL INTELLIGENCE & MACHINE LEARNING CAREER GUIDEARTIFICIAL INTELLIGENCE & MACHINE LEARNING CAREER GUIDE
ARTIFICIAL INTELLIGENCE & MACHINE LEARNING CAREER GUIDE
 
Optimisation
OptimisationOptimisation
Optimisation
 
Use case stb
Use case stbUse case stb
Use case stb
 
Decision trees
Decision treesDecision trees
Decision trees
 

Recently uploaded

Blooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxBlooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxUnboundStockton
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaVirag Sontakke
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...jaredbarbolino94
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupJonathanParaisoCruz
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxDr.Ibrahim Hassaan
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Meghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentMeghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentInMediaRes1
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
MICROBIOLOGY biochemical test detailed.pptx
MICROBIOLOGY biochemical test detailed.pptxMICROBIOLOGY biochemical test detailed.pptx
MICROBIOLOGY biochemical test detailed.pptxabhijeetpadhi001
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 

Recently uploaded (20)

TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Blooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxBlooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docx
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of India
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...
 
ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized Group
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptx
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Meghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentMeghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media Component
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
MICROBIOLOGY biochemical test detailed.pptx
MICROBIOLOGY biochemical test detailed.pptxMICROBIOLOGY biochemical test detailed.pptx
MICROBIOLOGY biochemical test detailed.pptx
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 

Regression

  • 1. Mathematics behind Machine Learning: Linear Regression Model Dr Lotfi Ncib, Associate Professor Of applied mathematics Esprit School of Engineering Disclaimer: Some of the Images and content have been taken from multiple online sources and this presentation is intended only for knowledge sharing but not for any commercial business intention
  • 2. 1 What is The difference between AI, ML and DL? • Artificial Intelligence AI tries to make computers intelligent in order to mimic the cognitive functions of humans. So, AI is a general field with a broad scope including: • Computer Vision, • Language Processing, • Creativity… • Machine Learning ML is the branch of AI that covers the statistical part of artificial intelligence. It teaches the computer to solve problems by looking at hundreds or thousands of examples, learning from them, and then using that experience to solve the same problem in new situations: • Regression, • Classification, • Clustering… • DL is a very special field of Machine Learning where computers can actually learn and make intelligent decisions on their own, • CNN • RNN…
  • 5. 4 What is Regression? Size (feet2) Number of bedrooms Number of floors Age of home (years) Price ($1000) 2104 5 1 45 460 1416 3 2 40 232 1534 3 2 30 315 852 2 1 36 178 1510 3 2 30 ? Regression is the process of predicting a continuous value. X: Independent variable Y: dependent variable Continuousvariable Regression is Supervised: Target is provided
  • 6. 5 Types of Regression • Simple Regression • Simple Linear Regression • Simple Non-Linear Regression. Predict Price($1000) vs Size(feet2) of all houses • Multiple Regression • Multiple Linear Regression • Multiple Non-Linear Regression. Predict Price($1000) vs Size(feet2) and number of bedrooms Types of Regression Simple Linear Non-Linear Multiple Linear Non-Linear One Variable 2+ Variables
  • 7. 6 Applications of Regression • Price estimation of house: • size, number of bedrooms, and so on. • Employment income: • hours of work, education, occupation, sex age, years of experience, and so on. Indeed you can find many examples of the usefulness of regression analysis in these and many other fields, or domains such as finance, healthcare, retail, and more.
  • 8. 7 Exemple of Regression algorithms We have many regression algorithms: • Ordinal regression • Poisson regression • Fast forest quantile regression • Linear, polynomial, Lasso, Stepwise, Ridge regression • Bayesian linear regression • Neural network regression • Decision forest regression • KNN • Boosted decision tree regression
  • 10. 9 Simple Linear Regression • Simple linear regression • Predict Price($1000) vs Size(feet2) of all houses • Independent variable (x): Size of house • Dependent variable (y): Price of house Size in feet2 (x) Price ($) in 1000 (y) 2104 460 1416 232 1534 315 852 178 1245 ? Notation: m = Number of training examples x = “input” variable / features y = “output” variable / “target” variable
  • 11. 10 Training Set Learning Algorithm h Size of house Estimated price hypothesis Linear regression with one variable. Univariate linear regression. Model representation ℎ 𝜃 𝑥 = 𝜃0 + 𝜃1 𝑥 Choice of ℎ ?
  • 12. 11 Cost function Training Set Size in feet2 (x) Price ($) in 1000 (y) 2104 460 1416 232 1534 315 852 178 Goal: Find regression line that makes sum of residuals as small as possible ℎ 𝜃 𝑥 = 𝜃0 + 𝜃1 𝑥 Hypothesis : 𝜃0, 𝜃1Parameters :
  • 13. 12 Cost function Idea: Choose 𝜃0, 𝜃1 so that ℎ 𝜃 is close to 𝑦 for our training samples 𝐽 𝜃0, 𝜃1 = 1 2𝑚 ෍ 𝑖=1 𝑚 (ℎ 𝜃(𝑥 𝑖 ) − 𝑦 𝑖 )2 𝜃0, 𝜃1 ℎ 𝜃 𝑥 = 𝜃0 + 𝜃1 𝑥Hypothesis : Parameters : Cost function : min 𝜃0,𝜃1 𝐽 𝜃0, 𝜃1Goal :
  • 14. 13 Analytical Solution the vectorization expression of linear regression cost function can be denoted as: 𝑋 = 1 𝑥(1) ⋮ ⋮ 1 𝑥(𝑚) 𝜃 = 𝜃0 𝜃1 𝐽 𝜃 = 1 2𝑚 𝑋𝜃 − 𝑦 𝑇 (𝑋𝜃 − 𝑦) 𝐽 𝜃 = 𝑋𝜃 − 𝑦 𝑇(𝑋𝜃 − 𝑦) 𝐽 𝜃 = ( 𝑋𝜃 𝑇 − 𝑦 𝑇 )(𝑋𝜃 − 𝑦) Since 1 2𝑚 is a constant, we omit this constant term. Then our cost function becomes: 𝑦 = 𝑦(1) ⋮ 𝑦(𝑚) This can be further simplified as: We expand it to obtain: 𝐽 𝜃 = 𝑋𝜃 𝑇 𝑋𝜃 − 𝑋𝜃 𝑇 𝑦 − 𝑦 𝑇 𝑋𝜃 + 𝑦 𝑇 𝑦 Cost function: 𝐽 𝜃0, 𝜃1 = 1 2𝑚 ෍ 𝑖=1 𝑚 (ℎ 𝜃(𝑥 𝑖 ) − 𝑦 𝑖 )2 Or ( 𝑋𝜃 𝑇 𝑦) 𝑇 = 𝑦 𝑇 (𝑋𝜃) Then 𝐽 𝜃 = 𝑋𝜃 𝑇 𝑋𝜃 − 2𝑦 𝑇 𝑋𝜃 + 𝑦 𝑇 𝑦
  • 15. 14 Further more, we can write it as: 𝐽 𝜃 = 𝜃 𝑇 𝑋 𝑇 𝑋𝜃 − 2𝑦 𝑇 𝑋𝜃 + 𝑦 𝑇 𝑦 Now we need to take derivative of the cost function. For convenience, the common matrix derivative formulas are listed as reference: 𝜕𝐴𝑋 𝜕𝑋 = 𝐴, 𝜕𝑋 𝑇 𝐴 𝜕𝑋 = 𝐴, 𝜕𝑋 𝑇 𝑋 𝜕𝑋 = 2𝑋, 𝜕𝑋 𝑇 𝐴𝑋 𝜕𝑋 = 𝐴𝑋 + 𝐴 𝑇 𝑋 Using the above formulas, we can derive our cost function respect to 𝜃 as: 𝜕𝐽 𝜃 𝜕𝜃 = 2𝑋 𝑇 𝑋𝜃 − 2𝑋 𝑇 𝑦 In order to solve the variables, we need to make the above derivation equal to zero, that is: 2𝑋 𝑇 𝑋𝜃 − 2𝑋 𝑇 𝑦 = 0 then 𝑋 𝑇 𝑋𝜃 = 𝑋 𝑇 𝑦 Thus we can compute θ as: 𝜃 = (𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝑦 Analytical Solution - What if 𝑋 𝑇 𝑋 is non-invertible? (singular/ degenerate)
  • 16. 15 Gradient descent Have some function Want Outline: • Start with some • Keep changing to reduce until we hopefully end up at a minimum
  • 17. 16 Gradient descent algorithm Correct: Simultaneous update Incorrect: Gradient descent
  • 18. 17 Gradient descent Gradient descent algorithm Linear Regression Model update and simultaneously
  • 20. 19 Size (feet2) Number of bedrooms Number of floors Age of home (years) Price ($1000) 2104 5 1 45 460 1416 3 2 40 232 1534 3 2 30 315 852 2 1 36 178 1510 3 2 30 ? Notation: m = Number of training examples n = Number of features(variables) 𝑥(𝑖) = “input” of the 𝑖 𝑡ℎ training example 𝑥𝑗 (𝑖) = value of feature 𝑗 in 𝑖 𝑡ℎ training example Model representation
  • 21. 20 Training Set Learning Algorithm h Size of house, Number of bedrooms, Numbers of floors, Age of home Estimated price hypothesis Choice of h ? Model representation ℎ 𝜃 𝑥 = 𝜃0 + 𝜃1 𝑥 ℎ 𝜃 𝑥 = 𝜃0 + 𝜃1 𝑥1 + 𝜃2 𝑥2 + 𝜃3 𝑥3 + 𝜃4 𝑥4
  • 22. 21 Model representation ℎ 𝜃 𝑥(𝑖) = 𝜃0 + 𝜃1 𝑥1 (𝑖) + 𝜃2 𝑥2 (𝑖) + 𝜃3 𝑥3 (𝑖) + 𝜃4 𝑥4 (𝑖) For convenience of notation, define 𝑥0 (𝑖) = 1 𝑥(𝑖) = 𝑥0 (𝑖) 𝑥1 (𝑖) 𝑥2 (𝑖) ⋮ 𝑥 𝑛 (𝑖) 𝜖ℝ 𝑛+1, 𝜃 = 𝜃0 𝜃1 𝜃2 ⋮ 𝜃 𝑛 𝜖ℝ 𝑛+1 Multivariate Linear regression Hypothesis : = 𝜃 𝑇 𝑥(𝑖) ℎ 𝜃 𝑥(𝑖) = 𝜃0 + 𝜃1 𝑥1 (𝑖) + 𝜃2 𝑥2 (𝑖) + 𝜃3 𝑥3 (𝑖) + 𝜃4 𝑥4 (𝑖)
  • 23. 22 Cost function Idea: Choose 𝜃0, 𝜃1,… 𝜃 𝑛 so that ℎ 𝜃 is close to 𝑦 for our training samples 𝐽 𝜃0, 𝜃1,… 𝜃 𝑛 = 1 2𝑚 ෍ 𝑖=1 𝑚 (ℎ 𝜃(𝑥 𝑖 ) − 𝑦 𝑖 )2 𝜃0, 𝜃1,… 𝜃 𝑛 Hypothesis : Parameters : Cost function : min 𝜃0,𝜃1,… 𝜃 𝑛 𝐽 𝜃0, 𝜃1,… 𝜃 𝑛Goal : ℎ 𝜃 𝑥(𝑖) = 𝜃0 + 𝜃1 𝑥1 (𝑖) + 𝜃2 𝑥2 (𝑖) + 𝜃3 𝑥3 (𝑖) + 𝜃4 𝑥4 (𝑖) In order to achieve the hypothesis for all the samples we use the following equation: ℎ 𝜃 𝑥 = 𝑋𝜃 = 𝑥0 (1) 𝑥1 (1) … 𝑥 𝑛 (1) 𝑥0 (2) ⋮ 𝑥1 (2) ⋮ … … 𝑥 𝑛 (2) ⋮ 𝑥0 (𝑚) 𝑥1 (𝑚) … 𝑥 𝑛 (𝑚) 𝜃0 𝜃1 ⋮ 𝜃 𝑛
  • 24. 23 Analytical Solution the vectorization expression of linear regression cost function can be denoted as: 𝑋 = 𝑥0 (1) 𝑥1 (1) … 𝑥 𝑛 (1) 𝑥0 (2) ⋮ 𝑥1 (2) ⋮ … … 𝑥 𝑛 (2) ⋮ 𝑥0 (𝑚) 𝑥1 (𝑚) … 𝑥 𝑛 (𝑚) 𝜃 = 𝜃0 𝜃1 ⋮ 𝜃 𝑛 𝐽 𝜃 = 1 2𝑚 𝑋𝜃 − 𝑦 𝑇(𝑋𝜃 − 𝑦) 𝑦 = 𝑦(1) 𝑦(2) ⋮ 𝑦(𝑚) Cost function: 𝐽 𝜃0, 𝜃1,… 𝜃 𝑛 = 1 2𝑚 ෍ 𝑖=1 𝑚 (ℎ 𝜃(𝑥 𝑖 ) − 𝑦 𝑖 )2 Thus we can compute 𝜃 as: 𝜃 = (𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝑦 - What if 𝑋 𝑇 𝑋 is non-invertible? (singular/ degenerate)
  • 25. 24 Gradient Descent Repeat Previously (n=1): New algorithm : Repeat Gradient descent
  • 26. 25 E.g. 𝑥1= size (0-2000 feet2) 𝑥2 = number of bedrooms (1-5) Feature Scaling Idea: Make sure features are on a similar scale. Replace 𝑥𝑖 with 𝑥𝑖 − 𝜇𝑖 to make features have approximately zero mean (Do not apply to 𝑥0 = 1 ). Mean normalization E.g. Gradient descent in practice : Feature Scaling
  • 27. 26 Gradient descent in practice : Feature Scaling Gradient descent - “Debugging”: How to make sure gradient descent is working correctly. - How to choose learning rate . - If is too small: slow convergence. - If is too large: may not decrease on every iteration; may not converge. To choose , try Summary: