SlideShare a Scribd company logo
Machine Learning
Supervised Learning: Logistic Regression
•Our goal is to estimate w from a training data
of <xi
,yi
> pairs
•One way to find such relationship is to
minimize the least squares error:
Linear regression
i
i
i
w − wx )2
arg min
∑
(
y
X
Y y = wx + ε
Applications of Linear Regression
Marks of a student based on the number of
hours he/she put into the preparation
• Simple linear regression…..
• Multiple linear regression….
• Non-linear problem….
Marks of a student based on the number of
hours he/she put into the preparation
let’s assume
Marks of a student (M) do depend on the number
of hours (H) he/she put up for preparation.
The following formula can represent the model:
Marks = function (No. of hours)
=> Marks = m*Hours + c
Marks of a student based on the number of
hours he/she put into the preparation
let’s assume
Marks of a student (M) do depend on the number
of hours (H) he/she put up for preparation.
The following formula can represent the model:
Marks = function (No. of hours)
=> Marks = m*Hours + c
Marks of a student based on the number of
hours he/she put into the preparation
let’s Plot the data to check if it’s a Linear Problem
– The easiest way to determine
Marks of a student based on the number of
hours he/she put into the preparation
Marks of a student based on the number of
hours he/she put into the preparation
Marks of a student based on the number of
hours he/she put into the preparation
How to determine the Slope of line
The value of m….
Marks of a student based on the number of
hours he/she put into the preparation
• The value of m (slope of the line) can be
determined using an objective function which
is a combination of the loss function and a
regularization term.
• For simple linear regression, the objective
function would be the summation of Mean
Squared Error (MSE).
• The best fit line would be obtained
by minimizing the objective function
(summation of mean squared error).
Marks of a student based on the number of
hours he/she put into the preparation
• The value of m (slope of the line) can be
determined using an objective function which
is a combination of the loss function and a
regularization term.
• For simple linear regression, the objective
function would be the summation of Mean
Squared Error (MSE).
• The best fit line would be obtained
by minimizing the objective function
(summation of mean squared error).
Predicting weight reduction in form of the
number of KGs reduced
• Lets Assume
IT could depend upon input features such as:
age, height, the weight of the person, and the
time spent on exercises,
Predicting weight reduction in form of the
number of KGs reduced
Weight Reduction = Function(Age, Height,
Weight, Time On Exercise)
=> Shoe-size = b1*Height + b2*Weight +
b3*age + b4*time On Exercise + b0
Predicting weight reduction in form of the
number of KGs reduced
As part of training the above model
Goal:
Find the value of b1, b2, b3, b4, and b0 which would
minimize the objective function.
The objective function would be the summation of
mean squared error which is nothing but the sum of
the square of the actual value and the predicted
value for different values of age, height, weight,
and time On Exercise
Forecasting sales
Organizations often use linear regression models to
forecast future sales.
This can be helpful for things like budgeting and
planning.
Algorithms such as Amazon’s item-to-item
collaborative filtering are used to predict what
customers will buy in the future based on their past
purchase history
Cash forecasting
Many businesses use linear regression to forecast how
much cash they’ll have on hand in the future.
This is important for things like managing expenses
and ensuring that there is enough cash on hand to
cover unexpected costs.
Analyzing survey data
Linear regression can also be used to analyze survey
data.
This can help businesses understand things like
customer satisfaction and product preferences.
For example, a company might use linear regression
to figure out how likely people are to recommend their
product to others..
Stock predictions
A lot of businesses use linear regression models to
predict how stocks will perform in the future.
This is done by analyzing past data on stock prices
and trends to identify patterns.
Predicting consumer behavior
Businesses can use linear regression to predict things
like how much a customer is likely to spend.
Regression models can also be used to predict
consumer behavior. This can be helpful for things like
targeted marketing and product development.
For example, Walmart uses linear regression to predict
what products will be popular in different regions of
the country.
Code:
Code:
import numpy as np
import matplotlib.pyplot as plt
def estimate_coef(x, y):
# number of observations/points
n = np.size(x)
# mean of x and y vector
m_x = np.mean(x)
m_y = np.mean(y)
# calculating cross-deviation and deviation about x
SS_xy = np.sum(y*x) - n*m_y*m_x
SS_xx = np.sum(x*x) - n*m_x*m_x
# calculating regression coefficients
b_1 = SS_xy / SS_xx
b_0 = m_y - b_1*m_x
return (b_0, b_1)
def plot_regression_line(x, y, b):
# plotting the actual points as scatter plot
plt.scatter(x, y, color = "m",
marker = "o", s = 30)
# predicted response vector
y_pred = b[0] + b[1]*x
# plotting the regression line
plt.plot(x, y_pred, color = "g")
# putting labels
plt.xlabel('x')
plt.ylabel('y')
# function to show plot
plt.show()
def main():
# observations / data
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
y = np.array([1, 3, 2, 5, 7, 8, 8, 9, 10, 12])
# estimating coefficients
b = estimate_coef(x, y)
print("Estimated coefficients:nb_0 = {} 
nb_1 = {}".format(b[0], b[1]))
# plotting regression line
plot_regression_line(x, y, b)
if __name__ == "__main__":
main()
Logistic Regression…
Regression for classification
• In some cases we can use linear regression for determining the
appropriate boundary.
• However, since the output is usually binary or discrete there are
more efficient regression methods
Regression for classification
• Assume we would like to use linear regression to learn the
parameters for p(y | X ; θ)
• Problems?
1
-1
Optimal regression
model
wT
X ≥ 0 ⇒ classify as 1
wT
X < 0 ⇒ classify as -1
Logistic Regression
Logistic Regression is basically a predictive model
analysis technique where the target variables (output)
are discrete values for a given set of features or input
(X).
For example whether someone is covid-19 positive
(1) or negative (0).
It is a very powerful yet simple classification
algorithm in machine learning borrowed from
statistics algorithms.
Logistic Regression
Logistic Regression is basically a predictive model
analysis technique where the target variables (output)
are discrete values for a given set of features or input
(X).
For example whether someone is covid-19 positive
(1) or negative (0).
It is a very powerful yet simple classification
algorithm in machine learning borrowed from
statistics algorithms.
Around 60% of the world’s classification problems
can be solved by using the logistic regression
algorithm.
Logistic Regression
Logistic regression is one of the most common
machine learning algorithms used for binary
classification.
It predicts the probability of occurrence of a binary
outcome.
Fraud detection, spam detection, cancer detection,
etc.
Sigmoid Function
It is a mathematical function having a characteristic
that can take any real value and map it to between 0
to 1 shaped like the letter “S”.
The sigmoid function also called a logistic function
g(h) =
1
1+ e− h
Sigmoid Function
Sigmoid Function
If the value of z goes to positive infinity then the
predicted value of y will become 1 and if it goes to
negative infinity then the predicted value of y will
become 0.
And if the outcome of the sigmoid function is more
than 0.5 then we classify that label as class 1 or
positive class and if it is less than 0.5 then we can
classify it to negative class or label as class 0.
Diff b/w Linear & Logistic
Regression
Linear Regression is used when our dependent variable is
continuous in nature for example weight, height, numbers, etc.
and in contrast,
Logistic Regression is used when the dependent variable is
binary or limited for example: yes and no, true and false, 1 or 2,
etc.
In the 19th century, people use linear regression on biology to
predict health disease but it is very risky for example if a patient
has cancer and its probability of malignant is 0.4 then in linear
regression it will show that cancer is benign (because
probability comes <0.5). That’s where Logistic Regression
comes which only provides us with binary results.
Logistic regression vs. Linear regression
1
T
1+ ew X
p( y = 0 | X ;θ ) = g(wT
X )
=
T
e
w X
T
1+ ew X
p( y = 1| X ;θ ) = 1− g(wT
X )
=
Determining parameters for logistic
regression problems
∏
i
y
i i
i
i (1− y )
(1− g( X ; w)) g( X ; w)
L( y | X ; w)
=
• So how do we learn the
parameters?
• Similar to other regression problems
we look for the MLE for w
• The likelihood of the data given
the model is:
1
T
1+ ew X
p( y = 0 | X ;θ ) = g( X ; w)
= T
e
w X
T
1+ ew X
p( y = 1| X ;θ ) = 1− g( X ; w)
=
Gradient Descent
Gradient descent is an optimization algorithm
used to minimize some function by iteratively
moving in the direction of steepest descent as
defined by the negative of the gradient.
In machine learning, we use gradient descent to
update the parameters of our model.
Gradient Descent
Starting at the top of the mountain, we take our first
step downhill in the direction specified by the
negative gradient.
Next we recalculate the negative gradient (passing in
the coordinates of our new point) and take another
step in the direction it specifies.
We continue this process iteratively until we get to
the bottom of our graph, or to a point where we can
no longer move downhill–a local minimum.
Gradient Descent
Learning Rate
The size of these steps is called the learning rate.
With a high learning rate we can cover more ground each
step, but we risk overshooting the lowest point since the slope
of the hill is constantly changing.
With a very low learning rate, we can confidently move in
the direction of the negative gradient since we are
recalculating it so frequently.
A low learning rate is more precise, but calculating the
gradient is time-consuming, so it will take us a very long time
to get to the bottom.
Gradient ascent
Slope = ∂z/
∂w
z
Δw
w
•Going in the direction to the slope will lead to a larger z
•But not too much, otherwise we would go beyond the
optimal w
Gradient descent
z Slope = ∂z/
∂w
Δz
Δw
w
•Going in the opposite direction to the slope will lead to
a smaller z
•But not too much, otherwise we would go beyond the
optimal w
Finding The Best Weights -
Hill Descent
Ball on a complicated hilly terrain
— rolls down to a local valley
↑
this is called a local minimum
Questions:
How to get to the bottom of the deepest valley?
How to do this when we don’t have gravity?
Our Ein
Has Only One Valley
Weights, w
In-sample
Error,
E
in
. . . because Ein
(w) is a convex function of w.
Gradient Descent Method
Batch Gradient Descent: use all examples in each iteration
Mini batch Gradient Descent: use some examples in each iteration
Stochastic Gradient Descent: use 1 example in each iteration
Regularization
• Similar to other data estimation problems, we may not have enough
samples to learn good models for logistic regression classification
• One way to overcome this is to ‘regularize’ the model, impose
additional constraints on the parameters we are fitting.
• For example, lets assume that wj
comes from a Gaussian
distribution with mean 0 and variance σ2
(where σ2
is a user defined
parameter): wj
~N(0, σ2
)
• In that case we have a prior on the parameters and so:
p( y = 1,θ | X ) ∝ p( y = 1| X ;θ ) p(θ )
Credits
Yasir Abu Mustafa ,Caltech university, California
Barnabas Poczos, Ziv Bar-Joseph School of Computer Science,
Carnegie Mellon University
Vibhav Gogate The University of Texas at Dallas

More Related Content

Similar to 3ml.pdf

Supervised Learning.pdf
Supervised Learning.pdfSupervised Learning.pdf
Supervised Learning.pdf
gadissaassefa
 
Kickstart ML.pptx
Kickstart ML.pptxKickstart ML.pptx
Kickstart ML.pptx
GDSCVJTI
 
Linear, Machine Learning or Probabilistic Predictive Models: What's Best for ...
Linear, Machine Learning or Probabilistic Predictive Models: What's Best for ...Linear, Machine Learning or Probabilistic Predictive Models: What's Best for ...
Linear, Machine Learning or Probabilistic Predictive Models: What's Best for ...
Bohdan Pavlyshenko
 
Model Selection and Validation
Model Selection and ValidationModel Selection and Validation
Model Selection and Validation
gmorishita
 
Bootcamp of new world to taken seriously
Bootcamp of new world to taken seriouslyBootcamp of new world to taken seriously
Bootcamp of new world to taken seriously
khaled125087
 
Machine learning Introduction
Machine learning IntroductionMachine learning Introduction
Machine learning Introduction
Kuppusamy P
 
Application of Machine Learning in Agriculture
Application of Machine  Learning in AgricultureApplication of Machine  Learning in Agriculture
Application of Machine Learning in Agriculture
Aman Vasisht
 
Chapter 1: Linear Regression
Chapter 1: Linear RegressionChapter 1: Linear Regression
Chapter 1: Linear Regression
AkmelSyed
 
Session 4 .pdf
Session 4 .pdfSession 4 .pdf
Session 4 .pdf
ssuser8cda84
 
Regression
RegressionRegression
Regression
Ncib Lotfi
 
Recommender Systems
Recommender SystemsRecommender Systems
Linear regression
Linear regression Linear regression
Linear regression
mohamed Naas
 
Linear_Regression
Linear_RegressionLinear_Regression
Linear_Regression
Mohamed Essam
 
Machine learning
Machine learningMachine learning
Machine learning
Shreyas G S
 
Understanding Blackbox Prediction via Influence Functions
Understanding Blackbox Prediction via Influence FunctionsUnderstanding Blackbox Prediction via Influence Functions
Understanding Blackbox Prediction via Influence Functions
SEMINARGROOT
 
Linear Regression (Machine Learning)
Linear Regression (Machine Learning)Linear Regression (Machine Learning)
Linear Regression (Machine Learning)
Omkar Rane
 
Lecture 11 linear regression
Lecture 11 linear regressionLecture 11 linear regression
Lecture 11 linear regression
Mostafa El-Hosseini
 
CS229 Machine Learning Lecture Notes
CS229 Machine Learning Lecture NotesCS229 Machine Learning Lecture Notes
CS229 Machine Learning Lecture Notes
Eric Conner
 
Artificial Intelligence.pptx
Artificial Intelligence.pptxArtificial Intelligence.pptx
Artificial Intelligence.pptx
Mubashir Hashmi
 
Different Types of Machine Learning Algorithms
Different Types of Machine Learning AlgorithmsDifferent Types of Machine Learning Algorithms
Different Types of Machine Learning Algorithms
rahmedraj93
 

Similar to 3ml.pdf (20)

Supervised Learning.pdf
Supervised Learning.pdfSupervised Learning.pdf
Supervised Learning.pdf
 
Kickstart ML.pptx
Kickstart ML.pptxKickstart ML.pptx
Kickstart ML.pptx
 
Linear, Machine Learning or Probabilistic Predictive Models: What's Best for ...
Linear, Machine Learning or Probabilistic Predictive Models: What's Best for ...Linear, Machine Learning or Probabilistic Predictive Models: What's Best for ...
Linear, Machine Learning or Probabilistic Predictive Models: What's Best for ...
 
Model Selection and Validation
Model Selection and ValidationModel Selection and Validation
Model Selection and Validation
 
Bootcamp of new world to taken seriously
Bootcamp of new world to taken seriouslyBootcamp of new world to taken seriously
Bootcamp of new world to taken seriously
 
Machine learning Introduction
Machine learning IntroductionMachine learning Introduction
Machine learning Introduction
 
Application of Machine Learning in Agriculture
Application of Machine  Learning in AgricultureApplication of Machine  Learning in Agriculture
Application of Machine Learning in Agriculture
 
Chapter 1: Linear Regression
Chapter 1: Linear RegressionChapter 1: Linear Regression
Chapter 1: Linear Regression
 
Session 4 .pdf
Session 4 .pdfSession 4 .pdf
Session 4 .pdf
 
Regression
RegressionRegression
Regression
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Linear regression
Linear regression Linear regression
Linear regression
 
Linear_Regression
Linear_RegressionLinear_Regression
Linear_Regression
 
Machine learning
Machine learningMachine learning
Machine learning
 
Understanding Blackbox Prediction via Influence Functions
Understanding Blackbox Prediction via Influence FunctionsUnderstanding Blackbox Prediction via Influence Functions
Understanding Blackbox Prediction via Influence Functions
 
Linear Regression (Machine Learning)
Linear Regression (Machine Learning)Linear Regression (Machine Learning)
Linear Regression (Machine Learning)
 
Lecture 11 linear regression
Lecture 11 linear regressionLecture 11 linear regression
Lecture 11 linear regression
 
CS229 Machine Learning Lecture Notes
CS229 Machine Learning Lecture NotesCS229 Machine Learning Lecture Notes
CS229 Machine Learning Lecture Notes
 
Artificial Intelligence.pptx
Artificial Intelligence.pptxArtificial Intelligence.pptx
Artificial Intelligence.pptx
 
Different Types of Machine Learning Algorithms
Different Types of Machine Learning AlgorithmsDifferent Types of Machine Learning Algorithms
Different Types of Machine Learning Algorithms
 

Recently uploaded

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
siemaillard
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
Thiyagu K
 
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptxStudents, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
EduSkills OECD
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
Pavel ( NSTU)
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
Jisc
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
siemaillard
 
ESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdfESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdf
Fundacja Rozwoju Społeczeństwa Przedsiębiorczego
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
Jisc
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
Jisc
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
EverAndrsGuerraGuerr
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
Anna Sz.
 
Basic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumersBasic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumers
PedroFerreira53928
 
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCECLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
BhavyaRajput3
 
Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)
rosedainty
 
Cambridge International AS A Level Biology Coursebook - EBook (MaryFosbery J...
Cambridge International AS  A Level Biology Coursebook - EBook (MaryFosbery J...Cambridge International AS  A Level Biology Coursebook - EBook (MaryFosbery J...
Cambridge International AS A Level Biology Coursebook - EBook (MaryFosbery J...
AzmatAli747758
 
How to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERPHow to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERP
Celine George
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
JosvitaDsouza2
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
MIRIAMSALINAS13
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
Tamralipta Mahavidyalaya
 
The Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve ThomasonThe Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve Thomason
Steve Thomason
 

Recently uploaded (20)

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
 
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptxStudents, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
ESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdfESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdf
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
 
Basic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumersBasic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumers
 
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCECLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
 
Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)
 
Cambridge International AS A Level Biology Coursebook - EBook (MaryFosbery J...
Cambridge International AS  A Level Biology Coursebook - EBook (MaryFosbery J...Cambridge International AS  A Level Biology Coursebook - EBook (MaryFosbery J...
Cambridge International AS A Level Biology Coursebook - EBook (MaryFosbery J...
 
How to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERPHow to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERP
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
 
The Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve ThomasonThe Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve Thomason
 

3ml.pdf

  • 2. •Our goal is to estimate w from a training data of <xi ,yi > pairs •One way to find such relationship is to minimize the least squares error: Linear regression i i i w − wx )2 arg min ∑ ( y X Y y = wx + ε
  • 4. Marks of a student based on the number of hours he/she put into the preparation • Simple linear regression….. • Multiple linear regression…. • Non-linear problem….
  • 5. Marks of a student based on the number of hours he/she put into the preparation let’s assume Marks of a student (M) do depend on the number of hours (H) he/she put up for preparation. The following formula can represent the model: Marks = function (No. of hours) => Marks = m*Hours + c
  • 6. Marks of a student based on the number of hours he/she put into the preparation let’s assume Marks of a student (M) do depend on the number of hours (H) he/she put up for preparation. The following formula can represent the model: Marks = function (No. of hours) => Marks = m*Hours + c
  • 7. Marks of a student based on the number of hours he/she put into the preparation let’s Plot the data to check if it’s a Linear Problem – The easiest way to determine
  • 8. Marks of a student based on the number of hours he/she put into the preparation
  • 9. Marks of a student based on the number of hours he/she put into the preparation
  • 10. Marks of a student based on the number of hours he/she put into the preparation How to determine the Slope of line The value of m….
  • 11. Marks of a student based on the number of hours he/she put into the preparation • The value of m (slope of the line) can be determined using an objective function which is a combination of the loss function and a regularization term. • For simple linear regression, the objective function would be the summation of Mean Squared Error (MSE). • The best fit line would be obtained by minimizing the objective function (summation of mean squared error).
  • 12. Marks of a student based on the number of hours he/she put into the preparation • The value of m (slope of the line) can be determined using an objective function which is a combination of the loss function and a regularization term. • For simple linear regression, the objective function would be the summation of Mean Squared Error (MSE). • The best fit line would be obtained by minimizing the objective function (summation of mean squared error).
  • 13. Predicting weight reduction in form of the number of KGs reduced • Lets Assume IT could depend upon input features such as: age, height, the weight of the person, and the time spent on exercises,
  • 14. Predicting weight reduction in form of the number of KGs reduced Weight Reduction = Function(Age, Height, Weight, Time On Exercise) => Shoe-size = b1*Height + b2*Weight + b3*age + b4*time On Exercise + b0
  • 15. Predicting weight reduction in form of the number of KGs reduced As part of training the above model Goal: Find the value of b1, b2, b3, b4, and b0 which would minimize the objective function. The objective function would be the summation of mean squared error which is nothing but the sum of the square of the actual value and the predicted value for different values of age, height, weight, and time On Exercise
  • 16. Forecasting sales Organizations often use linear regression models to forecast future sales. This can be helpful for things like budgeting and planning. Algorithms such as Amazon’s item-to-item collaborative filtering are used to predict what customers will buy in the future based on their past purchase history
  • 17. Cash forecasting Many businesses use linear regression to forecast how much cash they’ll have on hand in the future. This is important for things like managing expenses and ensuring that there is enough cash on hand to cover unexpected costs.
  • 18. Analyzing survey data Linear regression can also be used to analyze survey data. This can help businesses understand things like customer satisfaction and product preferences. For example, a company might use linear regression to figure out how likely people are to recommend their product to others..
  • 19. Stock predictions A lot of businesses use linear regression models to predict how stocks will perform in the future. This is done by analyzing past data on stock prices and trends to identify patterns.
  • 20. Predicting consumer behavior Businesses can use linear regression to predict things like how much a customer is likely to spend. Regression models can also be used to predict consumer behavior. This can be helpful for things like targeted marketing and product development. For example, Walmart uses linear regression to predict what products will be popular in different regions of the country.
  • 21. Code:
  • 22.
  • 23. Code: import numpy as np import matplotlib.pyplot as plt def estimate_coef(x, y): # number of observations/points n = np.size(x) # mean of x and y vector m_x = np.mean(x) m_y = np.mean(y) # calculating cross-deviation and deviation about x SS_xy = np.sum(y*x) - n*m_y*m_x SS_xx = np.sum(x*x) - n*m_x*m_x # calculating regression coefficients b_1 = SS_xy / SS_xx b_0 = m_y - b_1*m_x return (b_0, b_1)
  • 24. def plot_regression_line(x, y, b): # plotting the actual points as scatter plot plt.scatter(x, y, color = "m", marker = "o", s = 30) # predicted response vector y_pred = b[0] + b[1]*x # plotting the regression line plt.plot(x, y_pred, color = "g") # putting labels plt.xlabel('x') plt.ylabel('y') # function to show plot plt.show() def main(): # observations / data x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) y = np.array([1, 3, 2, 5, 7, 8, 8, 9, 10, 12]) # estimating coefficients b = estimate_coef(x, y) print("Estimated coefficients:nb_0 = {} nb_1 = {}".format(b[0], b[1])) # plotting regression line plot_regression_line(x, y, b) if __name__ == "__main__": main()
  • 25.
  • 27. Regression for classification • In some cases we can use linear regression for determining the appropriate boundary. • However, since the output is usually binary or discrete there are more efficient regression methods
  • 28. Regression for classification • Assume we would like to use linear regression to learn the parameters for p(y | X ; θ) • Problems? 1 -1 Optimal regression model wT X ≥ 0 ⇒ classify as 1 wT X < 0 ⇒ classify as -1
  • 29. Logistic Regression Logistic Regression is basically a predictive model analysis technique where the target variables (output) are discrete values for a given set of features or input (X). For example whether someone is covid-19 positive (1) or negative (0). It is a very powerful yet simple classification algorithm in machine learning borrowed from statistics algorithms.
  • 30. Logistic Regression Logistic Regression is basically a predictive model analysis technique where the target variables (output) are discrete values for a given set of features or input (X). For example whether someone is covid-19 positive (1) or negative (0). It is a very powerful yet simple classification algorithm in machine learning borrowed from statistics algorithms. Around 60% of the world’s classification problems can be solved by using the logistic regression algorithm.
  • 31. Logistic Regression Logistic regression is one of the most common machine learning algorithms used for binary classification. It predicts the probability of occurrence of a binary outcome. Fraud detection, spam detection, cancer detection, etc.
  • 32. Sigmoid Function It is a mathematical function having a characteristic that can take any real value and map it to between 0 to 1 shaped like the letter “S”. The sigmoid function also called a logistic function g(h) = 1 1+ e− h
  • 34. Sigmoid Function If the value of z goes to positive infinity then the predicted value of y will become 1 and if it goes to negative infinity then the predicted value of y will become 0. And if the outcome of the sigmoid function is more than 0.5 then we classify that label as class 1 or positive class and if it is less than 0.5 then we can classify it to negative class or label as class 0.
  • 35. Diff b/w Linear & Logistic Regression Linear Regression is used when our dependent variable is continuous in nature for example weight, height, numbers, etc. and in contrast, Logistic Regression is used when the dependent variable is binary or limited for example: yes and no, true and false, 1 or 2, etc. In the 19th century, people use linear regression on biology to predict health disease but it is very risky for example if a patient has cancer and its probability of malignant is 0.4 then in linear regression it will show that cancer is benign (because probability comes <0.5). That’s where Logistic Regression comes which only provides us with binary results.
  • 36. Logistic regression vs. Linear regression 1 T 1+ ew X p( y = 0 | X ;θ ) = g(wT X ) = T e w X T 1+ ew X p( y = 1| X ;θ ) = 1− g(wT X ) =
  • 37. Determining parameters for logistic regression problems ∏ i y i i i i (1− y ) (1− g( X ; w)) g( X ; w) L( y | X ; w) = • So how do we learn the parameters? • Similar to other regression problems we look for the MLE for w • The likelihood of the data given the model is: 1 T 1+ ew X p( y = 0 | X ;θ ) = g( X ; w) = T e w X T 1+ ew X p( y = 1| X ;θ ) = 1− g( X ; w) =
  • 38. Gradient Descent Gradient descent is an optimization algorithm used to minimize some function by iteratively moving in the direction of steepest descent as defined by the negative of the gradient. In machine learning, we use gradient descent to update the parameters of our model.
  • 39. Gradient Descent Starting at the top of the mountain, we take our first step downhill in the direction specified by the negative gradient. Next we recalculate the negative gradient (passing in the coordinates of our new point) and take another step in the direction it specifies. We continue this process iteratively until we get to the bottom of our graph, or to a point where we can no longer move downhill–a local minimum.
  • 41. Learning Rate The size of these steps is called the learning rate. With a high learning rate we can cover more ground each step, but we risk overshooting the lowest point since the slope of the hill is constantly changing. With a very low learning rate, we can confidently move in the direction of the negative gradient since we are recalculating it so frequently. A low learning rate is more precise, but calculating the gradient is time-consuming, so it will take us a very long time to get to the bottom.
  • 42. Gradient ascent Slope = ∂z/ ∂w z Δw w •Going in the direction to the slope will lead to a larger z •But not too much, otherwise we would go beyond the optimal w
  • 43. Gradient descent z Slope = ∂z/ ∂w Δz Δw w •Going in the opposite direction to the slope will lead to a smaller z •But not too much, otherwise we would go beyond the optimal w
  • 44. Finding The Best Weights - Hill Descent Ball on a complicated hilly terrain — rolls down to a local valley ↑ this is called a local minimum Questions: How to get to the bottom of the deepest valley? How to do this when we don’t have gravity?
  • 45. Our Ein Has Only One Valley Weights, w In-sample Error, E in . . . because Ein (w) is a convex function of w.
  • 46. Gradient Descent Method Batch Gradient Descent: use all examples in each iteration Mini batch Gradient Descent: use some examples in each iteration Stochastic Gradient Descent: use 1 example in each iteration
  • 47. Regularization • Similar to other data estimation problems, we may not have enough samples to learn good models for logistic regression classification • One way to overcome this is to ‘regularize’ the model, impose additional constraints on the parameters we are fitting. • For example, lets assume that wj comes from a Gaussian distribution with mean 0 and variance σ2 (where σ2 is a user defined parameter): wj ~N(0, σ2 ) • In that case we have a prior on the parameters and so: p( y = 1,θ | X ) ∝ p( y = 1| X ;θ ) p(θ )
  • 48. Credits Yasir Abu Mustafa ,Caltech university, California Barnabas Poczos, Ziv Bar-Joseph School of Computer Science, Carnegie Mellon University Vibhav Gogate The University of Texas at Dallas