2. Concerned questions
● Collaboration
○ The maximum number for a group is 3.
○ Submit technical report individually, include the whole framework of your project, and what you
have done in the project.
○ Your score will be given combining the whole project and your work.
● Project
○ A good technical report is composed of three parts: novel idea, good writing, and code (good
code ≠ high score)
○ Project flowchart (next slide)
● Submit your group list and project proposal (up to one page) by the end of this
week.
3. Flowchart of a ML project
Data collection and
annotation
Training data
(60%)
Validating data
(20%)
Test data
(20%)
Machine learning
model
6. Concern
● If α is too small, gradient descent can be slow, more iterations are needed
● If α is too large, gradient descent can overshoot the minimum. It may fail to
converge, or even diverge.
● May converge to a local minimum if the cost function is non-convex
● When we approach a local minimum, gradient descent will automatically take
smaller steps. So, no need to decrease α over time.
● “Batch”: all the training examples are used in each step of gradient descent
7. Gradient descent for linear regression
(one variable)
Repeat until convergence{
}
8. Question: which algorithm is correct?
Repeat until convergence{
}
Repeat until convergence{
}
Correct
9. Gradient descent for linear regression
(one variable)
Repeat until convergence{
}
10. Linear regression with multiple variables
● n : number of features
● x(i) : input features of i-th training example
● x(j) : value of feature j in i-th training example
Area of site
(1000 square feet)
Size of living place
(1000 square feet)
Number of rooms Ages in years Selling price
3.472 0.998 7 42 25.9
3.531 1.500 7 62 29.5
2.275 1.175 6 40 27.9
4.050 1.232 6 54 25.9
... ... ... ... ...
14. Suggestions on gradient descent
● How to make sure gradient descent is working correctly?
● How to choose learning rate?
15. Make sure gradient descent is working correctly
● Ideal cost output: decrease sharply, then slightly decrease
● Declare convergence if cost decrease between two iterations is less than a
threshold
16. Choosing learning rate
● α too small, more iterations; α too large, may not converge
● To choose α, try
Ideal small large
17. Feature normalization - intuition
● Each feature has a different scale, which may generates an oval shape
● Result: more iterations to converge
● Solution: feature normalization
19. Normal equation for linear regression
Area of site
(1000 square
feet)
Size of living
place (1000
square feet)
Number of
rooms
Ages in
years
Selling
price
3.472 0.998 7 42 25.9
3.531 1.500 7 62 29.5
2.275 1.175 6 40 27.9
4.050 1.232 6 54 25.9
Area of site
(1000 square
feet)
Size of living
place (1000
square feet)
Number of
rooms
Ages in
years
Selling
price
1 3.472 0.998 7 42 25.9
1 3.531 1.500 7 62 29.5
1 2.275 1.175 6 40 27.9
1 4.050 1.232 6 54 25.9
20. Normal equation for linear regression
● Instead of gradient descent, we can obtain the best solution using the
following equations:
● Matlab one line code: pinv(X’*X)*X’*y
21. Gradient descent vs normal equation
Gradient descent
● Need to choose learning rate
● Need many iterations
● Works well even when number of features
n is very large
Normal equation
● No learning rate
● No iterations
● Need to compute , slow when
is very large, non-invertible
27. Interpretation of hypothesis output
● Estimated probability that y = 1, given x , “parameterized” by T :
● Example: given tumor size, if we have , which means: tell patient
that 70% chance of tumor being malignant.
● As we only have two classes, we have:
29. ● Hx
● Decision boundary:
● Note: different ML model generates
different decision boundary
Decision boundary
30. Cost function
● A new representation of cost function:
● In linear regression, it is given by:
31. Cost function
● Logistic regression cost function:
● Intuition: if h(x) = 0, but y = 1, the learning algorithm will be penalized by a
very large cost.
● We can compact two cases into one equation: