Machine learning using matlab.pdf

Machine Learning using
Matlab
Lecture 2 Linear regression

Concerned questions
● Collaboration
○ The maximum number for a group is 3.
○ Submit technical report individually, include the whole framework of your project, and what you
have done in the project.
○ Your score will be given combining the whole project and your work.
● Project
○ A good technical report is composed of three parts: novel idea, good writing, and code (good
code ≠ high score)
○ Project flowchart (next slide)
● Submit your group list and project proposal (up to one page) by the end of this
week.

Flowchart of a ML project
Data collection and
annotation
Training data
(60%)
Validating data
(20%)
Test data
(20%)
Machine learning
model

Linear regression with one variable
Hypothesis:
Parameters:
Cost Function:
Goal: ?

Gradient descent
● Given a function fx , our objective is
● Repeat until convergence{
}

Concern
● If α is too small, gradient descent can be slow, more iterations are needed
● If α is too large, gradient descent can overshoot the minimum. It may fail to
converge, or even diverge.
● May converge to a local minimum if the cost function is non-convex
● When we approach a local minimum, gradient descent will automatically take
smaller steps. So, no need to decrease α over time.
● “Batch”: all the training examples are used in each step of gradient descent

Gradient descent for linear regression
(one variable)
Repeat until convergence{
}

Question: which algorithm is correct?
}
}
Correct

Linear regression with multiple variables
● n : number of features
● x(i) : input features of i-th training example
● x(j) : value of feature j in i-th training example
Area of site
(1000 square feet)
Size of living place
(1000 square feet)
Number of rooms Ages in years Selling price
3.472 0.998 7 42 25.9
3.531 1.500 7 62 29.5
2.275 1.175 6 40 27.9
4.050 1.232 6 54 25.9
... ... ... ... ...

Hypothesis
● Hypothesis for multiple features:
● If we define x0 = 1, then we have

Linear regression with multiple variables
Hypothesis:
Parameters:
Cost Function:
Gradient descent:
● Repeat until convergence{
}

Gradient descent for linear regression
(multiple variables)
Repeat{
}

Suggestions on gradient descent
● How to make sure gradient descent is working correctly?
● How to choose learning rate?

Make sure gradient descent is working correctly
● Ideal cost output: decrease sharply, then slightly decrease
● Declare convergence if cost decrease between two iterations is less than a
threshold

Choosing learning rate
● α too small, more iterations; α too large, may not converge
● To choose α, try
Ideal small large

Feature normalization - intuition
● Each feature has a different scale, which may generates an oval shape
● Result: more iterations to converge
● Solution: feature normalization

Feature normalization
● Feature scaling:
● Standard score:

Normal equation for linear regression
Area of site
(1000 square
feet)
Size of living
place (1000
square feet)
Number of
rooms
Ages in
years
Selling
price
3.472 0.998 7 42 25.9
3.531 1.500 7 62 29.5
2.275 1.175 6 40 27.9
4.050 1.232 6 54 25.9
Area of site
(1000 square
feet)
Size of living
place (1000
square feet)
Number of
rooms
Ages in
years
Selling
price
1 3.472 0.998 7 42 25.9
1 3.531 1.500 7 62 29.5
1 2.275 1.175 6 40 27.9
1 4.050 1.232 6 54 25.9

Normal equation for linear regression
● Instead of gradient descent, we can obtain the best solution using the
following equations:
● Matlab one line code: pinv(X’*X)*X’*y

Gradient descent vs normal equation
Gradient descent
● Need to choose learning rate
● Need many iterations
● Works well even when number of features
n is very large
Normal equation
● No learning rate
● No iterations
● Need to compute , slow when
is very large, non-invertible

Congratulations!
You have learnt your first ML model!
Questions?

Classification: logistic regression

Binary classification
● Examples:
○ Email: spam/not spam?
○ Tumor: malignant/benign?
○ Object: car/not car?
● ,where 1 is positive class, and 0 is negative class, e.g, spam (1) vs
not spam (0).
● Intuitively, negative class conveys absence of something

Hypothesis
● If hx > 0.5 , predict “y = 1”
● If hx < 0.5 , predict “y = 0”
●

Logistic regression model
● Hell
● sigmoid/logistic function:

Interpretation of hypothesis output
● Estimated probability that y = 1, given x , “parameterized” by T :
● Example: given tumor size, if we have , which means: tell patient
that 70% chance of tumor being malignant.
● As we only have two classes, we have:

Logistic regression
● when hx > 0.5 , predict “y = 1”
● when hx < 0.5 , predict “y = 0”

● Hx
● Decision boundary:
● Note: different ML model generates
different decision boundary
Decision boundary

Cost function
● A new representation of cost function:
● In linear regression, it is given by:

Cost function
● Logistic regression cost function:
● Intuition: if h(x) = 0, but y = 1, the learning algorithm will be penalized by a
very large cost.
● We can compact two cases into one equation:

Machine learning using matlab.pdf

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Machine learning using matlab.pdf

Similar to Machine learning using matlab.pdf (20)

Recently uploaded

Recently uploaded (20)

Machine learning using matlab.pdf