Machine Learning lecture3(linear regression)

•Download as PPTX, PDF•

0 likes•117 views

cairo university

Engineering

Linear Regression with
multiple variables
Multiple features
Machine Learning

Andrew Ng
Size (feet2) Price ($1000)
2104 460
1416 232
1534 315
852 178
… …
Multiple features (variables)

Andrew Ng
Size (feet2) Number of
bedrooms
Number of
floors
Age of home
(years)
Price ($1000)
2104 5 1 45 460
1416 3 2 40 232
1534 3 2 30 315
852 2 1 36 178
… … … … …
Multiple features (variables).
Notation:
= number of features
= input (features) of training example.
= value of feature in training example.

Andrew Ng
For convenience of notation, define .
Multivariate linear regression.

Linear Regression with
multiple variables
Gradient descent for
multiple variables
Machine Learning

Andrew Ng
Hypothesis:
Cost function:
Parameters:
(simultaneously update for every )
Repeat
Gradient descent:

Andrew Ng
(simultaneously update )
Gradient Descent
Repeat
Previously (n=1):
New algorithm :
Repeat
(simultaneously update for
)

Linear Regression with
multiple variables
Gradient descent in
practice I: Feature Scaling
Machine Learning

Andrew Ng
E.g. = size (0-2000 feet2)
= number of bedrooms (1-5)
Feature Scaling
Idea: Make sure features are on a similar scale.
size (feet2)
number of bedrooms

Andrew Ng
Feature Scaling
Get every feature into approximately a range.

Andrew Ng
Replace with to make features have approximately zero mean
(Do not apply to ).
Mean normalization
E.g.

Linear Regression with
multiple variables
Gradient descent in
practice II: Learning rate
Machine Learning

Andrew Ng
Gradient descent
- “Debugging”: How to make sure gradient
descent is working correctly.
- How to choose learning rate .

Andrew Ng
Example automatic
convergence test:
Declare convergence if
decreases by less than
in one iteration.
0 100 200 300 400
No. of iterations
Making sure gradient descent is working correctly.
Or,

Andrew Ng
Making sure gradient descent is working correctly.
Gradient descent not working.
Use smaller .
No. of iterations
No. of iterations θ
- For sufficiently small , should decrease on every iteration.
- But if is too small, gradient descent can be slow to converge.

Andrew Ng
Summary:
- If is too small: slow convergence.
- If is too large: may not decrease on
every iteration; may not converge.
To choose , try
. . ., 0.001, , 0.01, , 0.1, , 1, . . .

Linear Regression with
multiple variables
Features and
polynomial regression
Machine Learning

Andrew Ng
Polynomial regression
Price
(y)
Size (x)

Andrew Ng
Choice of features
Price
(y)
Size (x)

Linear Regression with
multiple variables
Normal equation
Machine Learning

Andrew Ng
Gradient Descent
Normal equation: Method to solve for
analytically.

Andrew Ng
Intuition: If 1D
Solve for
(for every )

Andrew Ng
Size (feet2) Number of
bedrooms
Number of
floors
Age of home
(years)
Price ($1000)
1 2104 5 1 45 460
1 1416 3 2 40 232
1 1534 3 2 30 315
1 852 2 1 36 178
Size (feet2) Number of
bedrooms
Number of
floors
Age of home
(years)
Price ($1000)
2104 5 1 45 460
1416 3 2 40 232
1534 3 2 30 315
852 2 1 36 178
Examples:

Andrew Ng
is inverse of matrix .
Octave: pinv(X’*X)*X’*y

Andrew Ng
training examples, features.
Gradient Descent Normal Equation
• No need to choose .
• Don’t need to iterate.
• Need to choose .
• Needs many iterations.
• Works well even
when is large.
• Need to compute
• Slow if is very large.

Linear Regression with
multiple variables
Normal equation
and non-invertibility
(optional)
Machine Learning

Andrew Ng
Normal equation
- What if is non-invertible? (singular/
degenerate)
- Octave: pinv(X’*X)*X’*y

Andrew Ng
What if is non-invertible?
• Redundant features (linearly dependent).
E.g. size in feet2
size in m2
• Too many features (e.g. ).
- Delete some features, or use regularization.

What's hot

3 d transformationPooja Dixit

Geometric transformation 2d chapter 5geethawilliam

Work 1 dFabián Flor

SUEC 高中 Adv Maths (Absolute Value)tungwc

CG 2D TransformationMohitModyani

Supot37255412160Ajay Ochani

Physics_150612_01Art Traynor

Computer graphics basic transformationSelvakumar Gna

2d/3D transformations in computer graphics(Computer graphics Tutorials)Daroko blog(www.professionalbloggertricks.com)

2D TransformationAsma Tehseen

1. teoría del consumidor problemas de aplicación teoría microeconómica (64° c...JamesWilliamsHuancas

Богдан Павлишенко (Bohdan Pavlyshenko) - "Linear, Machine Learning and Probab...Lviv Startup Club

Mock Tutoring LessonVal Cavanaugh

Grade 9 homework questions on 2.4 and 2.5ambermaine100

Normalizing flowJong-Jin Kim

Matlab graphicspramodkumar1804

Math in the News: Issue 87Media4math

Exam Prediction Machine Learning Algorithm Islam uddin

Extending built in objectsMuhammad Ahmed

What's hot (19)

3 d transformation

Geometric transformation 2d chapter 5

Work 1 d

SUEC 高中 Adv Maths (Absolute Value)

CG 2D Transformation

Supot37255412160

Physics_150612_01

Computer graphics basic transformation

2d/3D transformations in computer graphics(Computer graphics Tutorials)

2D Transformation

1. teoría del consumidor problemas de aplicación teoría microeconómica (64° c...

Богдан Павлишенко (Bohdan Pavlyshenko) - "Linear, Machine Learning and Probab...

Mock Tutoring Lesson

Grade 9 homework questions on 2.4 and 2.5

Normalizing flow

Matlab graphics

Math in the News: Issue 87

Exam Prediction Machine Learning Algorithm

Extending built in objects

Similar to Machine Learning lecture3(linear regression)

Lecture4.pptxSanjarBey

RegressionNcib Lotfi

Machine Learning lecture6(regularization)cairo university

2. Linear regression with one variable.pptxEmad Nabil

Gentlest Introduction to TensorflowKhor SoonHin

Coursera 2weekcsl9496

A Mathematically Derived Number of Resamplings for Noisy Optimization (GECCO2...Jialin LIU

Anlysis and design of algorithms part 1Deepak John

Algorithm chapter 2chidabdu

Noisy optimization --- (theory oriented) SurveyOlivier Teytaud

1. Regression_V1.pdfssuser4c50a9

C3_W2.pdfShaheenKolimi

Theories of continuous optimizationOlivier Teytaud

Making BIG DATA smallerTony Tran

Session 4 .pdfssuser8cda84

dynamic programming Rod cutting classgiridaroori

K-Means Clustering SimplyEmad Nabil

[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...npinto

Lec10.pptxAbrahamTadesse11

1.2 matlab numerical dataTANVIRAHMED611926

Similar to Machine Learning lecture3(linear regression) (20)

Lecture4.pptx

Regression

Machine Learning lecture6(regularization)

2. Linear regression with one variable.pptx

Gentlest Introduction to Tensorflow

Coursera 2week

A Mathematically Derived Number of Resamplings for Noisy Optimization (GECCO2...

Anlysis and design of algorithms part 1

Algorithm chapter 2

Noisy optimization --- (theory oriented) Survey

1. Regression_V1.pdf

C3_W2.pdf

Theories of continuous optimization

Making BIG DATA smaller

Session 4 .pdf

dynamic programming Rod cutting class

K-Means Clustering Simply

[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...

Lec10.pptx

1.2 matlab numerical data

Recently uploaded

Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR9953056974 Low Rate Call Girls In Saket, Delhi NCR

Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona

GDSC ASEB Gen AI study jams presentationGDSCAESB

(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat

the ladakh protest in leh ladakh 2024 sonam wangchuk.pptxhumanexperienceaaa

Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan

Extrusion Processes and Their Limitations120cr0395

Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia

College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile

The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat

(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat

Architect Hassan Khalil Portfolio for 2024hassan khalil

APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3

Analog to Digital and Digital to Analog ConverterAbhinavSharma374939

9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Low Rate Call Girls In Saket, Delhi NCR

Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis

Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...srsj9000

High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat

Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxnull - The Open Security Community

Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis

Recently uploaded (20)

Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR

Processing & Properties of Floor and Wall Tiles.pptx

GDSC ASEB Gen AI study jams presentation

(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...

the ladakh protest in leh ladakh 2024 sonam wangchuk.pptx

Coefficient of Thermal Expansion and their Importance.pptx

Extrusion Processes and Their Limitations

Software Development Life Cycle By Team Orange (Dept. of Pharmacy)

College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik

The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...

(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts

Architect Hassan Khalil Portfolio for 2024

APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS

Analog to Digital and Digital to Analog Converter

9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf

Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...

Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...

High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts

Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx

Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...

Machine Learning lecture3(linear regression)

1. Linear Regression with multiple variables Multiple features Machine Learning

2. Andrew Ng Size (feet2) Price ($1000) 2104 460 1416 232 1534 315 852 178 … … Multiple features (variables)

3. Andrew Ng Size (feet2) Number of bedrooms Number of floors Age of home (years) Price ($1000) 2104 5 1 45 460 1416 3 2 40 232 1534 3 2 30 315 852 2 1 36 178 … … … … … Multiple features (variables). Notation: = number of features = input (features) of training example. = value of feature in training example.

4. Andrew Ng Hypothesis: Previously:

5. Andrew Ng For convenience of notation, define . Multivariate linear regression.

6. Linear Regression with multiple variables Gradient descent for multiple variables Machine Learning

7. Andrew Ng Hypothesis: Cost function: Parameters: (simultaneously update for every ) Repeat Gradient descent:

8. Andrew Ng (simultaneously update ) Gradient Descent Repeat Previously (n=1): New algorithm : Repeat (simultaneously update for )

9. Linear Regression with multiple variables Gradient descent in practice I: Feature Scaling Machine Learning

10. Andrew Ng E.g. = size (0-2000 feet2) = number of bedrooms (1-5) Feature Scaling Idea: Make sure features are on a similar scale. size (feet2) number of bedrooms

11. Andrew Ng Feature Scaling Get every feature into approximately a range.

12. Andrew Ng Replace with to make features have approximately zero mean (Do not apply to ). Mean normalization E.g.

13. Linear Regression with multiple variables Gradient descent in practice II: Learning rate Machine Learning

14. Andrew Ng Gradient descent - “Debugging”: How to make sure gradient descent is working correctly. - How to choose learning rate .

15. Andrew Ng Example automatic convergence test: Declare convergence if decreases by less than in one iteration. 0 100 200 300 400 No. of iterations Making sure gradient descent is working correctly. Or,

16. Andrew Ng Making sure gradient descent is working correctly. Gradient descent not working. Use smaller . No. of iterations No. of iterations θ - For sufficiently small , should decrease on every iteration. - But if is too small, gradient descent can be slow to converge.

17. Andrew Ng Summary: - If is too small: slow convergence. - If is too large: may not decrease on every iteration; may not converge. To choose , try . . ., 0.001, , 0.01, , 0.1, , 1, . . .

18. Linear Regression with multiple variables Features and polynomial regression Machine Learning

19. Andrew Ng Housing prices prediction

20. Andrew Ng Polynomial regression Price (y) Size (x)

21. Andrew Ng Choice of features Price (y) Size (x)

22. Linear Regression with multiple variables Normal equation Machine Learning

23. Andrew Ng Gradient Descent Normal equation: Method to solve for analytically.

24. Andrew Ng Intuition: If 1D Solve for (for every )

25. Andrew Ng Size (feet2) Number of bedrooms Number of floors Age of home (years) Price ($1000) 1 2104 5 1 45 460 1 1416 3 2 40 232 1 1534 3 2 30 315 1 852 2 1 36 178 Size (feet2) Number of bedrooms Number of floors Age of home (years) Price ($1000) 2104 5 1 45 460 1416 3 2 40 232 1534 3 2 30 315 852 2 1 36 178 Examples:

26. Andrew Ng examples ; features. E.g. If

27. Andrew Ng is inverse of matrix . Octave: pinv(X’*X)*X’*y

28. Andrew Ng training examples, features. Gradient Descent Normal Equation • No need to choose . • Don’t need to iterate. • Need to choose . • Needs many iterations. • Works well even when is large. • Need to compute • Slow if is very large.

29. Linear Regression with multiple variables Normal equation and non-invertibility (optional) Machine Learning

30. Andrew Ng Normal equation - What if is non-invertible? (singular/ degenerate) - Octave: pinv(X’*X)*X’*y

31. Andrew Ng What if is non-invertible? • Redundant features (linearly dependent). E.g. size in feet2 size in m2 • Too many features (e.g. ). - Delete some features, or use regularization.

Editor's Notes

Pop-up Quiz
Features should take a similar range of values. This leads to a faster convergence of the gradient descent.
The feature range of values shouldn’t be very large and also not very tiny. Or, it shouldn’t be far from the -1 to +1 range.
Debugging by plotting cost function J versus no. of iterations of the algorithm. The algorithm works well as J decreases in the plot. The plot also shows where the algorithm converges as the curve becomes flattened. Choosing the threshold in the automatic convergence test is difficult.
We try to pick the largest possible reasonable value for the learning rate to ensure a fast convergence.
We can define new features to get a better model.
We can fit the polynomial function to the linear regression model by redefining or choosing features. In this case, feature scaling becomes extremely important.
This slide shows another different choice for your features.
Pop-up Quiz
Pop-up Quiz
Feature scaling is not necessary here.
As the no. of features is less than 1000, the normal equation method is useful. When the model becomes more complicated n> 1000, better use gradient descent.
Regularization enables to fit a large no. of parameters using a small no. of training examples.

Machine Learning lecture3(linear regression)

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to Machine Learning lecture3(linear regression)

Similar to Machine Learning lecture3(linear regression) (20)

More from cairo university

More from cairo university (20)

Recently uploaded

Recently uploaded (20)

Machine Learning lecture3(linear regression)

Editor's Notes