2. Andrew Ng
Size (feet2) Price ($1000)
2104 460
1416 232
1534 315
852 178
… …
Multiple features (variables)
3. Andrew Ng
Size (feet2) Number of
bedrooms
Number of
floors
Age of home
(years)
Price ($1000)
2104 5 1 45 460
1416 3 2 40 232
1534 3 2 30 315
852 2 1 36 178
… … … … …
Multiple features (variables).
Notation:
= number of features
= input (features) of training example.
= value of feature in training example.
10. Andrew Ng
E.g. = size (0-2000 feet2)
= number of bedrooms (1-5)
Feature Scaling
Idea: Make sure features are on a similar scale.
size (feet2)
number of bedrooms
14. Andrew Ng
Gradient descent
- “Debugging”: How to make sure gradient
descent is working correctly.
- How to choose learning rate .
15. Andrew Ng
Example automatic
convergence test:
Declare convergence if
decreases by less than
in one iteration.
0 100 200 300 400
No. of iterations
Making sure gradient descent is working correctly.
Or,
16. Andrew Ng
Making sure gradient descent is working correctly.
Gradient descent not working.
Use smaller .
No. of iterations
No. of iterations θ
- For sufficiently small , should decrease on every iteration.
- But if is too small, gradient descent can be slow to converge.
17. Andrew Ng
Summary:
- If is too small: slow convergence.
- If is too large: may not decrease on
every iteration; may not converge.
To choose , try
. . ., 0.001, , 0.01, , 0.1, , 1, . . .
25. Andrew Ng
Size (feet2) Number of
bedrooms
Number of
floors
Age of home
(years)
Price ($1000)
1 2104 5 1 45 460
1 1416 3 2 40 232
1 1534 3 2 30 315
1 852 2 1 36 178
Size (feet2) Number of
bedrooms
Number of
floors
Age of home
(years)
Price ($1000)
2104 5 1 45 460
1416 3 2 40 232
1534 3 2 30 315
852 2 1 36 178
Examples:
28. Andrew Ng
training examples, features.
Gradient Descent Normal Equation
• No need to choose .
• Don’t need to iterate.
• Need to choose .
• Needs many iterations.
• Works well even
when is large.
• Need to compute
• Slow if is very large.
31. Andrew Ng
What if is non-invertible?
• Redundant features (linearly dependent).
E.g. size in feet2
size in m2
• Too many features (e.g. ).
- Delete some features, or use regularization.
Editor's Notes
Pop-up Quiz
Features should take a similar range of values.
This leads to a faster convergence of the gradient descent.
The feature range of values shouldn’t be very large and also not very tiny. Or, it shouldn’t be far from the -1 to +1 range.
Debugging by plotting cost function J versus no. of iterations of the algorithm. The algorithm works well as J decreases in the plot. The plot also shows where the algorithm converges as the curve becomes flattened.
Choosing the threshold in the automatic convergence test is difficult.
We try to pick the largest possible reasonable value for the learning rate to ensure a fast convergence.
We can define new features to get a better model.
We can fit the polynomial function to the linear regression model by redefining or choosing features.
In this case, feature scaling becomes extremely important.
This slide shows another different choice for your features.
Pop-up Quiz
Pop-up Quiz
Feature scaling is not necessary here.
As the no. of features is less than 1000, the normal equation method is useful. When the model becomes more complicated n> 1000, better use gradient descent.
Regularization enables to fit a large no. of parameters using a small no. of training examples.