SlideShare a Scribd company logo
1 of 20
REGRESSION
LINEAR REGRESSION
LINEAR REGRESSION
• IT IS A SIMPLE MODEL BUT EVERYONE NEEDS TO MASTER IT AS IT LAYS THE
FOUNDATION FOR OTHER MACHINE LEARNING ALGORITHMS.
• IT CAN BE USED TO UNDERSTAND THE FACTORS THAT INFLUENCE
PROFITABILITY.
• IT CAN BE USED TO FORECAST SALES IN THE COMING MONTHS BY ANALYZING
THE SALES DATA FOR PREVIOUS MONTHS.
• IT CAN ALSO BE USED TO GAIN VARIOUS INSIGHTS ABOUT CUSTOMER
BEHAVIOUR.
LINEAR REGRESSION
• THE OBJECTIVE OF A LINEAR REGRESSION
MODEL IS TO FIND A RELATIONSHIP
BETWEEN ONE OR MORE
FEATURES(INDEPENDENT VARIABLES) AND
A CONTINUOUS TARGET
VARIABLE(DEPENDENT VARIABLE).
• WHEN THERE IS ONLY FEATURE IT IS
CALLED UNI-VARIATE LINEAR REGRESSION
• IF THERE ARE MULTIPLE FEATURES, IT IS
CALLED MULTIPLE LINEAR REGRESSION.
HYPOTHESIS OF LINEAR REGRESSION
• THE LINEAR REGRESSION MODEL CAN BE REPRESENTED BY THE FOLLOWING
EQUATION
• Y IS THE PREDICTED VALUE
• Θ₀ IS THE BIAS TERM.
• Θ₁,…,Θₙ ARE THE MODEL PARAMETERS
• X₁, X₂,…,Xₙ ARE THE FEATURE VALUES.
HYPOTHESIS OF LINEAR
REGRESSION
• Θ IS THE MODEL’S PARAMETER VECTOR
INCLUDING THE BIAS TERM Θ₀
• X IS THE FEATURE VECTOR WITH X₀ =1
• TRAINING OF THE MODEL HERE MEANS TO FIND
THE PARAMETERS SO THAT THE MODEL BEST
FITS THE DATA.
HOW DO WE DETERMINE
THE BEST FIT LINE?
• THE LINE FOR WHICH THE ERROR
(RESIDUALS) BETWEEN THE PREDICTED
VALUES AND THE OBSERVED VALUES IS
MINIMUM IS CALLED THE BEST FIT LINE
OR THE REGRESSION LINE.
• TO DEFINE AND MEASURE THE ERROR OF
OUR MODEL WE DEFINE THE COST
FUNCTION AS THE SUM OF THE
SQUARES OF THE RESIDUALS.
THE COST FUNCTION
• THE COST FUNCTION IS DENOTED BY:
• WHERE THE HYPOTHESIS FUNCTION H(X) IS DENOTED BY
• M IS THE TOTAL NUMBER OF TRAINING EXAMPLES IN OUR DATA-SET.
TRAINING A LINEAR REGRESSION MODEL
• WHY DO WE TAKE THE SQUARE OF THE RESIDUALS AND NOT THE ABSOLUTE
VALUE OF THE RESIDUALS ?
• WE WANT TO PENALIZE THE POINTS WHICH ARE FARTHER FROM THE
REGRESSION LINE MUCH MORE THAN THE POINTS WHICH LIE CLOSE TO THE
LINE.
• OUR OBJECTIVE IS TO FIND THE MODEL PARAMETERS SO THAT THE COST
FUNCTION IS MINIMUM. WE WILL USE GRADIENT DESCENT TO FIND THIS.
GRADIENT
DESCENT
• GRADIENT DESCENT IS A GENERIC
OPTIMIZATION ALGORITHM USED
IN MANY MACHINE LEARNING
ALGORITHMS.
• IT ITERATIVELY TWEAKS THE
PARAMETERS OF THE MODEL IN
ORDER TO MINIMIZE THE COST
FUNCTION.
GRADIENT DESCENT
• WE FIRST INITIALIZE THE MODEL PARAMETERS WITH SOME RANDOM VALUES. THIS IS
ALSO CALLED AS RANDOM INITIALIZATION.
• NOW WE NEED TO MEASURE HOW THE COST FUNCTION CHANGES WITH CHANGE IN
IT’S PARAMETERS. THEREFORE WE COMPUTE THE PARTIAL DERIVATIVES OF THE
COST FUNCTION W.R.T TO THE PARAMETERS Θ₀, Θ₁, … , Θₙ
GRADIENT DESCENT
• THE PARTIAL DERIVATIVE OF THE COST FUNCTION W.R.T TO ANY PARAMETER CAN
BE DENOTED BY:
• WE CAN COMPUTE THE PARTIAL DERIVATIVES FOR ALL PARAMETERS AT ONCE USING:
• WHERE H(X) IS
GRADIENT DESCENT
• AFTER COMPUTING THE DERIVATIVE WE UPDATE THE PARAMETERS AS GIVEN
BELOW
• WHERE ALPHA IS THE LEARNING PARAMETER.
• WE CAN UPDATE ALL THE PARAMETERS AT ONCE USING:
GRADIENT DESCENT
• WE REPEAT THE EARLIER STEPS
UNTIL THE COST FUNCTION
CONVERGES TO THE MINIMUM
VALUE.
• TO DEMONSTRATE THE GRADIENT
DESCENT ALGORITHM, WE
INITIALIZE THE MODEL
PARAMETERS WITH 0. THE
EQUATION BECOMES Y =
0. GRADIENT DESCENT ALGORITHM
NOW TRIES TO UPDATE THE VALUE
OF THE PARAMETERS SO THAT WE
ARRIVE AT THE BEST FIT LINE.
WHEN THE LEARNING RATE IS VERY SLOW, THE
GRADIENT DESCENT TAKES LARGER TIME TO FIND
THE BEST FIT LINE.
WHEN THE LEARNING RATE IS NORMAL
WHEN THE LEARNING RATE IS ARBITRARILY HIGH,
GRADIENT DESCENT ALGORITHM KEEPS
OVERSHOOTING THE BEST FIT LINE AND MAY EVEN
FAIL TO FIND THE BEST LINE.
GRADIENT DESCENT
EVALUATING THE PERFORMANCE OF
THE MODEL
• ROOT MEAN SQUARED ERROR(RMSE) AND COEFFICIENT OF
DETERMINATION(R² SCORE) ARE POPULARLY USED TO EVALUATE REGRESSION
MODEL.
• RMSE IS THE SQUARE ROOT OF THE AVERAGE OF THE SUM OF THE SQUARES OF
RESIDUALS.
• RMSE IS DEFINED BY
EVALUATING THE PERFORMANCE OF
THE MODEL
• R² IS DETERMINED BY
• SSₙ IS THE TOTAL SUM OF ERRORS IF WE TAKE THE MEAN OF THE OBSERVED
VALUES AS THE PREDICTED VALUE.
• SSᵣ IS THE SUM OF THE SQUARE OF RESIDUALS
R SQUARE
• SSₜ - 69.47588572871659
SSᵣ - 7.64070234454893
R² score - 0.8900236785122296
• IF WE USE THE MEAN OF THE OBSERVED VALUES AS THE PREDICTED VALUE THE
VARIANCE IS 69.47588572871659 AND IF WE USE REGRESSION THE TOTAL
VARIANCE IS 7.64070234454893. WE REDUCED THE PREDICTION ERROR BY
~ 89% BY USING REGRESSION

More Related Content

Similar to Regression

Class 8 IAshhjjj3hwjkwkwmmdjjsignment Problem.pptx
Class 8 IAshhjjj3hwjkwkwmmdjjsignment Problem.pptxClass 8 IAshhjjj3hwjkwkwmmdjjsignment Problem.pptx
Class 8 IAshhjjj3hwjkwkwmmdjjsignment Problem.pptx
joycho11
 

Similar to Regression (20)

15303589.ppt
15303589.ppt15303589.ppt
15303589.ppt
 
Unveiling the Market: Predicting House Prices with Data Science
Unveiling the Market: Predicting House Prices with Data ScienceUnveiling the Market: Predicting House Prices with Data Science
Unveiling the Market: Predicting House Prices with Data Science
 
Predicting House Prices: A Machine Learning Approach
Predicting House Prices: A Machine Learning ApproachPredicting House Prices: A Machine Learning Approach
Predicting House Prices: A Machine Learning Approach
 
Machine Learning Unit 4 Semester 3 MSc IT Part 2 Mumbai University
Machine Learning Unit 4 Semester 3  MSc IT Part 2 Mumbai UniversityMachine Learning Unit 4 Semester 3  MSc IT Part 2 Mumbai University
Machine Learning Unit 4 Semester 3 MSc IT Part 2 Mumbai University
 
CHAPTER 4.1.pdf
CHAPTER 4.1.pdfCHAPTER 4.1.pdf
CHAPTER 4.1.pdf
 
Mathematical Optimisation - Fundamentals and Applications
Mathematical Optimisation - Fundamentals and ApplicationsMathematical Optimisation - Fundamentals and Applications
Mathematical Optimisation - Fundamentals and Applications
 
Prediction of House Sales Price
Prediction of House Sales PricePrediction of House Sales Price
Prediction of House Sales Price
 
Regression Analysis.pptx
Regression Analysis.pptxRegression Analysis.pptx
Regression Analysis.pptx
 
Regression Analysis Techniques.pptx
Regression Analysis Techniques.pptxRegression Analysis Techniques.pptx
Regression Analysis Techniques.pptx
 
HRUG - Linear regression with R
HRUG - Linear regression with RHRUG - Linear regression with R
HRUG - Linear regression with R
 
linear regression .pptx
linear regression .pptxlinear regression .pptx
linear regression .pptx
 
Lead Scoring Group Case Study Presentation.pdf
Lead Scoring Group Case Study Presentation.pdfLead Scoring Group Case Study Presentation.pdf
Lead Scoring Group Case Study Presentation.pdf
 
An overview of gradient descent optimization algorithms
An overview of gradient descent optimization algorithms An overview of gradient descent optimization algorithms
An overview of gradient descent optimization algorithms
 
Class 8 IAshhjjj3hwjkwkwmmdjjsignment Problem.pptx
Class 8 IAshhjjj3hwjkwkwmmdjjsignment Problem.pptxClass 8 IAshhjjj3hwjkwkwmmdjjsignment Problem.pptx
Class 8 IAshhjjj3hwjkwkwmmdjjsignment Problem.pptx
 
ml-05x01.pdf
ml-05x01.pdfml-05x01.pdf
ml-05x01.pdf
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
 
SAFe Variability
SAFe VariabilitySAFe Variability
SAFe Variability
 
Pricing like a data scientist
Pricing like a data scientistPricing like a data scientist
Pricing like a data scientist
 
The Bootstrap and Beyond: Using JSL for Resampling
The Bootstrap and Beyond: Using JSL for ResamplingThe Bootstrap and Beyond: Using JSL for Resampling
The Bootstrap and Beyond: Using JSL for Resampling
 
dimension reduction.ppt
dimension reduction.pptdimension reduction.ppt
dimension reduction.ppt
 

Recently uploaded

21P35A0312 Internship eccccccReport.docx
21P35A0312 Internship eccccccReport.docx21P35A0312 Internship eccccccReport.docx
21P35A0312 Internship eccccccReport.docx
rahulmanepalli02
 

Recently uploaded (20)

Artificial Intelligence in due diligence
Artificial Intelligence in due diligenceArtificial Intelligence in due diligence
Artificial Intelligence in due diligence
 
SLIDESHARE PPT-DECISION MAKING METHODS.pptx
SLIDESHARE PPT-DECISION MAKING METHODS.pptxSLIDESHARE PPT-DECISION MAKING METHODS.pptx
SLIDESHARE PPT-DECISION MAKING METHODS.pptx
 
Worksharing and 3D Modeling with Revit.pptx
Worksharing and 3D Modeling with Revit.pptxWorksharing and 3D Modeling with Revit.pptx
Worksharing and 3D Modeling with Revit.pptx
 
History of Indian Railways - the story of Growth & Modernization
History of Indian Railways - the story of Growth & ModernizationHistory of Indian Railways - the story of Growth & Modernization
History of Indian Railways - the story of Growth & Modernization
 
Augmented Reality (AR) with Augin Software.pptx
Augmented Reality (AR) with Augin Software.pptxAugmented Reality (AR) with Augin Software.pptx
Augmented Reality (AR) with Augin Software.pptx
 
Interfacing Analog to Digital Data Converters ee3404.pdf
Interfacing Analog to Digital Data Converters ee3404.pdfInterfacing Analog to Digital Data Converters ee3404.pdf
Interfacing Analog to Digital Data Converters ee3404.pdf
 
21P35A0312 Internship eccccccReport.docx
21P35A0312 Internship eccccccReport.docx21P35A0312 Internship eccccccReport.docx
21P35A0312 Internship eccccccReport.docx
 
Autodesk Construction Cloud (Autodesk Build).pptx
Autodesk Construction Cloud (Autodesk Build).pptxAutodesk Construction Cloud (Autodesk Build).pptx
Autodesk Construction Cloud (Autodesk Build).pptx
 
Raashid final report on Embedded Systems
Raashid final report on Embedded SystemsRaashid final report on Embedded Systems
Raashid final report on Embedded Systems
 
NEWLETTER FRANCE HELICES/ SDS SURFACE DRIVES - MAY 2024
NEWLETTER FRANCE HELICES/ SDS SURFACE DRIVES - MAY 2024NEWLETTER FRANCE HELICES/ SDS SURFACE DRIVES - MAY 2024
NEWLETTER FRANCE HELICES/ SDS SURFACE DRIVES - MAY 2024
 
21scheme vtu syllabus of visveraya technological university
21scheme vtu syllabus of visveraya technological university21scheme vtu syllabus of visveraya technological university
21scheme vtu syllabus of visveraya technological university
 
Passive Air Cooling System and Solar Water Heater.ppt
Passive Air Cooling System and Solar Water Heater.pptPassive Air Cooling System and Solar Water Heater.ppt
Passive Air Cooling System and Solar Water Heater.ppt
 
What is Coordinate Measuring Machine? CMM Types, Features, Functions
What is Coordinate Measuring Machine? CMM Types, Features, FunctionsWhat is Coordinate Measuring Machine? CMM Types, Features, Functions
What is Coordinate Measuring Machine? CMM Types, Features, Functions
 
8th International Conference on Soft Computing, Mathematics and Control (SMC ...
8th International Conference on Soft Computing, Mathematics and Control (SMC ...8th International Conference on Soft Computing, Mathematics and Control (SMC ...
8th International Conference on Soft Computing, Mathematics and Control (SMC ...
 
5G and 6G refer to generations of mobile network technology, each representin...
5G and 6G refer to generations of mobile network technology, each representin...5G and 6G refer to generations of mobile network technology, each representin...
5G and 6G refer to generations of mobile network technology, each representin...
 
The Entity-Relationship Model(ER Diagram).pptx
The Entity-Relationship Model(ER Diagram).pptxThe Entity-Relationship Model(ER Diagram).pptx
The Entity-Relationship Model(ER Diagram).pptx
 
Instruct Nirmaana 24-Smart and Lean Construction Through Technology.pdf
Instruct Nirmaana 24-Smart and Lean Construction Through Technology.pdfInstruct Nirmaana 24-Smart and Lean Construction Through Technology.pdf
Instruct Nirmaana 24-Smart and Lean Construction Through Technology.pdf
 
Insurance management system project report.pdf
Insurance management system project report.pdfInsurance management system project report.pdf
Insurance management system project report.pdf
 
Independent Solar-Powered Electric Vehicle Charging Station
Independent Solar-Powered Electric Vehicle Charging StationIndependent Solar-Powered Electric Vehicle Charging Station
Independent Solar-Powered Electric Vehicle Charging Station
 
Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)
 

Regression

  • 2. LINEAR REGRESSION • IT IS A SIMPLE MODEL BUT EVERYONE NEEDS TO MASTER IT AS IT LAYS THE FOUNDATION FOR OTHER MACHINE LEARNING ALGORITHMS. • IT CAN BE USED TO UNDERSTAND THE FACTORS THAT INFLUENCE PROFITABILITY. • IT CAN BE USED TO FORECAST SALES IN THE COMING MONTHS BY ANALYZING THE SALES DATA FOR PREVIOUS MONTHS. • IT CAN ALSO BE USED TO GAIN VARIOUS INSIGHTS ABOUT CUSTOMER BEHAVIOUR.
  • 3. LINEAR REGRESSION • THE OBJECTIVE OF A LINEAR REGRESSION MODEL IS TO FIND A RELATIONSHIP BETWEEN ONE OR MORE FEATURES(INDEPENDENT VARIABLES) AND A CONTINUOUS TARGET VARIABLE(DEPENDENT VARIABLE). • WHEN THERE IS ONLY FEATURE IT IS CALLED UNI-VARIATE LINEAR REGRESSION • IF THERE ARE MULTIPLE FEATURES, IT IS CALLED MULTIPLE LINEAR REGRESSION.
  • 4. HYPOTHESIS OF LINEAR REGRESSION • THE LINEAR REGRESSION MODEL CAN BE REPRESENTED BY THE FOLLOWING EQUATION • Y IS THE PREDICTED VALUE • Θ₀ IS THE BIAS TERM. • Θ₁,…,Θₙ ARE THE MODEL PARAMETERS • X₁, X₂,…,Xₙ ARE THE FEATURE VALUES.
  • 5. HYPOTHESIS OF LINEAR REGRESSION • Θ IS THE MODEL’S PARAMETER VECTOR INCLUDING THE BIAS TERM Θ₀ • X IS THE FEATURE VECTOR WITH X₀ =1 • TRAINING OF THE MODEL HERE MEANS TO FIND THE PARAMETERS SO THAT THE MODEL BEST FITS THE DATA.
  • 6. HOW DO WE DETERMINE THE BEST FIT LINE? • THE LINE FOR WHICH THE ERROR (RESIDUALS) BETWEEN THE PREDICTED VALUES AND THE OBSERVED VALUES IS MINIMUM IS CALLED THE BEST FIT LINE OR THE REGRESSION LINE. • TO DEFINE AND MEASURE THE ERROR OF OUR MODEL WE DEFINE THE COST FUNCTION AS THE SUM OF THE SQUARES OF THE RESIDUALS.
  • 7. THE COST FUNCTION • THE COST FUNCTION IS DENOTED BY: • WHERE THE HYPOTHESIS FUNCTION H(X) IS DENOTED BY • M IS THE TOTAL NUMBER OF TRAINING EXAMPLES IN OUR DATA-SET.
  • 8. TRAINING A LINEAR REGRESSION MODEL • WHY DO WE TAKE THE SQUARE OF THE RESIDUALS AND NOT THE ABSOLUTE VALUE OF THE RESIDUALS ? • WE WANT TO PENALIZE THE POINTS WHICH ARE FARTHER FROM THE REGRESSION LINE MUCH MORE THAN THE POINTS WHICH LIE CLOSE TO THE LINE. • OUR OBJECTIVE IS TO FIND THE MODEL PARAMETERS SO THAT THE COST FUNCTION IS MINIMUM. WE WILL USE GRADIENT DESCENT TO FIND THIS.
  • 9. GRADIENT DESCENT • GRADIENT DESCENT IS A GENERIC OPTIMIZATION ALGORITHM USED IN MANY MACHINE LEARNING ALGORITHMS. • IT ITERATIVELY TWEAKS THE PARAMETERS OF THE MODEL IN ORDER TO MINIMIZE THE COST FUNCTION.
  • 10. GRADIENT DESCENT • WE FIRST INITIALIZE THE MODEL PARAMETERS WITH SOME RANDOM VALUES. THIS IS ALSO CALLED AS RANDOM INITIALIZATION. • NOW WE NEED TO MEASURE HOW THE COST FUNCTION CHANGES WITH CHANGE IN IT’S PARAMETERS. THEREFORE WE COMPUTE THE PARTIAL DERIVATIVES OF THE COST FUNCTION W.R.T TO THE PARAMETERS Θ₀, Θ₁, … , Θₙ
  • 11. GRADIENT DESCENT • THE PARTIAL DERIVATIVE OF THE COST FUNCTION W.R.T TO ANY PARAMETER CAN BE DENOTED BY: • WE CAN COMPUTE THE PARTIAL DERIVATIVES FOR ALL PARAMETERS AT ONCE USING: • WHERE H(X) IS
  • 12. GRADIENT DESCENT • AFTER COMPUTING THE DERIVATIVE WE UPDATE THE PARAMETERS AS GIVEN BELOW • WHERE ALPHA IS THE LEARNING PARAMETER. • WE CAN UPDATE ALL THE PARAMETERS AT ONCE USING:
  • 13. GRADIENT DESCENT • WE REPEAT THE EARLIER STEPS UNTIL THE COST FUNCTION CONVERGES TO THE MINIMUM VALUE. • TO DEMONSTRATE THE GRADIENT DESCENT ALGORITHM, WE INITIALIZE THE MODEL PARAMETERS WITH 0. THE EQUATION BECOMES Y = 0. GRADIENT DESCENT ALGORITHM NOW TRIES TO UPDATE THE VALUE OF THE PARAMETERS SO THAT WE ARRIVE AT THE BEST FIT LINE.
  • 14. WHEN THE LEARNING RATE IS VERY SLOW, THE GRADIENT DESCENT TAKES LARGER TIME TO FIND THE BEST FIT LINE.
  • 15. WHEN THE LEARNING RATE IS NORMAL
  • 16. WHEN THE LEARNING RATE IS ARBITRARILY HIGH, GRADIENT DESCENT ALGORITHM KEEPS OVERSHOOTING THE BEST FIT LINE AND MAY EVEN FAIL TO FIND THE BEST LINE.
  • 18. EVALUATING THE PERFORMANCE OF THE MODEL • ROOT MEAN SQUARED ERROR(RMSE) AND COEFFICIENT OF DETERMINATION(R² SCORE) ARE POPULARLY USED TO EVALUATE REGRESSION MODEL. • RMSE IS THE SQUARE ROOT OF THE AVERAGE OF THE SUM OF THE SQUARES OF RESIDUALS. • RMSE IS DEFINED BY
  • 19. EVALUATING THE PERFORMANCE OF THE MODEL • R² IS DETERMINED BY • SSₙ IS THE TOTAL SUM OF ERRORS IF WE TAKE THE MEAN OF THE OBSERVED VALUES AS THE PREDICTED VALUE. • SSᵣ IS THE SUM OF THE SQUARE OF RESIDUALS
  • 20. R SQUARE • SSₜ - 69.47588572871659 SSᵣ - 7.64070234454893 R² score - 0.8900236785122296 • IF WE USE THE MEAN OF THE OBSERVED VALUES AS THE PREDICTED VALUE THE VARIANCE IS 69.47588572871659 AND IF WE USE REGRESSION THE TOTAL VARIANCE IS 7.64070234454893. WE REDUCED THE PREDICTION ERROR BY ~ 89% BY USING REGRESSION

Editor's Notes

  1. partial derivatives we want to see how the function changes as we let just one of those variables change while holding all the others constant. The gradient of a function f, is the collection of all its partial derivatives into a vector.