SlideShare a Scribd company logo
History of Regression
• T H E T E R M R E G R E S S I O N WA S I N T R O D U C E D B Y F R A N C I S G A LTO N I N 1 8 8 6 I N H I S R E S E A R C H PA P E R “ FA M I LY
L I K E N E S S I N S TAT U R E ”
• H E O B S E R V E D T H AT TA L L PA R E N T S H AV E TA L L S O N S A N D S H O R T PA R E N T S H AV E S H O R T S O N S
• H E C A M E U P W I T H T H E V I E W T H AT T H E H E I G H T O F T H E S O N S R E G R E S S “ R E V E R T B A C K ” TO WA R D S T H E AV E R A G E
H E I G H T O F T H E P O P U L AT I O N
Dr. Deepak Mehta
Associate Professor
AIT CSE
Regression Introduction
• Lets take a simple example : Suppose your manager asked you to predict annual sales.
• There can be a hundred of factors (drivers) that affects sales
• In this case, sales is your dependent variable. Factors affecting sales are independent
variables.
• Regression analysis would help you to solve this problem.
• In simple words, regression analysis is used to model the relationship between a
dependent variable and one or more independent variables.
It helps us to answer the following questions –
• Which of the drivers have a significant impact on sales.
• Which is the most important driver of sales
• How do the drivers interact with each other
• What would be annual sale next year
Regressing Terminologies
Outliers
Suppose there is an observation in the dataset which is having a very high or very low
value as compared to the other observations in the data, i.e. it does not belong to the
population, such an observation is called an outlier. In simple words, it is extreme value.
An outlier is a problem because many times it hampers the results we get.
Multicollinearity
When the independent variables are highly correlated to each other then the variables
are said to be multicollinear. Many types of regression techniques assumes
multicollinearity should not be present in the dataset. It is because it causes problems in
ranking variables based on its importance. Or it makes job difficult in selecting the most
important independent variable (factor).
Terminologies
Heteroscedasticity
When dependent variable’s variability is not equal across values of an independent
variable, it is called heteroscedasticity. Example – As one’s income increases, the
variability of food consumption will increase. A poorer person will spend a rather constant
amount by always eating inexpensive food; a wealthier person may occasionally buy
inexpensive food and at other times eat expensive meals. Those with higher incomes
display a greater variability of food consumption.
Underfitting and Overfitting
When we use unnecessary explanatory variables it might lead to overfitting. Overfitting
means that our algorithm works well on the training set but is unable to perform better on
the test sets. It is also known as problem of high variance.
History of Regression
Galton’s law of universal regression was confirmed by his friend Karl Pearson, who collected data
from more than 1000 families.
He found that the average height of the sons of tall father is less than the their father’s height
and average height of the sons of short father is more than their fathers’ height.
Research paper of Karl Pearson
“On law of inheritance, biometrika 1903
Modern Regression
It investigate the dependence of one variable, conventionally called dependent variable, on one
or more variables called independent variable and provide an equation to be used for estimating
or predicting the average of the dependent variable from the known values of independent
variable.
Modern Regression
A variable whose variation we try to explain is dependent variable while the independent
variable is a variable that is used to explain the variation in the dependent variable.
When dependency of variable is studied upon one variable it is called simple linear regression or
two variable regression
While dependency upon two or more variable is studied in multiple regression
Deterministic vs Probabilistic
If there is a exact relationship b.w. variables s.t knowing value of one variable we can have
unique value of other, such relation is known as Deterministic relationship or mathematical
relation.
e.g. consider y=a + bx. Substituting value of x we can have unique value of y
Example of such relationship is F=32+(9/5)C, relationship b.w Fahrenheit and Celsius.
Area of circle pi*r*r
Deterministic vs Probabilistic
If relationship b.w. variable is not exct, we cannot have precisely value of one variable by putting
the value of the other, such relationship is known as probabilistic or stochastic or statistical
relationship
e.g. consider y=a + bX+e. substituting value of x we cannot have unique value of y, until we know
e. where e is unknown random error.
For ex. Consider relationship b.w. weight and height of person . All person with same height will
not have same weight.
In deterministic model all the point of order pairing of two variable (x,y) fall on the line of
relationship.
Dependent Variable
Whereas in probabilistic model all the points of order pairing of two variable (x,y) donot fall on
the line of relationship, there must be scattering around the line of relationship
Dependent variable is random variable also called
◦ Explained variable
◦ Predictand
◦ Regressand
◦ Response
◦ Endogenous
◦ Outcome
◦ Controlled variable
Independent variable
The independent variable is fixed variable also called
◦ Explanatory variable
◦ Predictor
◦ Regressor
◦ Stimulus
◦ Exogenous
◦ Covariate
◦ Control variable
Assumption of Linear Regression
Linear Relationship
Very low /no multicollinearity
No heteroscedastic
No autocorrelation b.w the errors
Normal distribution of errors
All observation are independent
Summary
LR is a ML algorithm based on Supervised Learning
Regression model is a target prediction model based on independent variables
Regression technique find out the relationship bw X(input) and Y(output) e.g. workexpSalary
Hypothesis function of LR: y=W0+W1X
When training this model : it fits the best line to predict the values of y for a given value of X
..
•It shows the significant relationship b.w. label( dependent variable) and the features
(independent variables)
•It indicate the strength of impact of multiple independent variables on a dependent variable
•Regression model is used to predict the continuous value
•LR establishes a relationship b.w. dependent variable (y) and one or more independent
variables(X) using best fit straight line (also known as regression line)
There must be a linear relation between independent and dependent variables
There should not be any outliers present
No heteroscedasticity
Sample observations should be independent
Error terms should be normally distributed with mean 0 and constant variance
Absence of multicollinearity and auto-correlation
Interpretation of regression
coefficients
Let us consider an example where the dependent variable is marks obtained by a
student and explanatory variables are number of hours studied and no. of classes
attended. Suppose on fitting linear regression we got the linear regression as:
Marks obtained = 5 + 2 (no. of hours studied) + 0.5(no. of classes attended)
Thus we can have the regression coefficients 2 and 0.5 which can interpreted as:
- If no. of hours studied and no. of classes are 0 then the student will obtain 5 marks.
- Keeping no. of classes attended constant, if student studies for one hour more then he
will score 2 more marks in the examination.
- Similarly keeping no. of hours studied constant, if student attends one more class then
he will attain 0.5 marks more.
Least Squares
As we know y=w0+w1X+error h(xi)+ei
Error=yi-h(xi)
Error is residual error in the observation
Main aim is to minimize the residual error
Squared error or cost function:
Linear Regression Assignment Advertisement Dataset
X = advertising['TV']
y = advertising['Sales’]
-------------------------------------------------------
from sklearn.model_selection import train_test_split
X_train, X_test, y_train,y_test=train_test_split(X, y, train_size = 0.7, test_size = 0.3, random_state = 100)
X_train.head()
y_train.head()

More Related Content

Similar to ML4 Regression.pptx

Multicollinearity PPT
Multicollinearity PPTMulticollinearity PPT
Multicollinearity PPT
GunjanKhandelwal13
 
Multiple linear regression
Multiple linear regressionMultiple linear regression
Multiple linear regression
Avjinder (Avi) Kaler
 
Artificial Intelligence (Unit - 8).pdf
Artificial Intelligence   (Unit  -  8).pdfArtificial Intelligence   (Unit  -  8).pdf
Artificial Intelligence (Unit - 8).pdf
SathyaNarayanan47813
 
MIKKAH MAE C. MANGAYAN- FS103.inferential-statisticspptx
MIKKAH MAE C. MANGAYAN- FS103.inferential-statisticspptxMIKKAH MAE C. MANGAYAN- FS103.inferential-statisticspptx
MIKKAH MAE C. MANGAYAN- FS103.inferential-statisticspptx
NoelJoseMalanum1
 
Data-Analysis.pptx
Data-Analysis.pptxData-Analysis.pptx
Data-Analysis.pptx
RobertRevamonte1
 
Correlations and t scores (2)
Correlations and t scores (2)Correlations and t scores (2)
Correlations and t scores (2)
Pedro Martinez
 
Correlation and Regression.pdf
Correlation and Regression.pdfCorrelation and Regression.pdf
Correlation and Regression.pdf
AadarshSah1
 
Quantitative analysis
Quantitative analysisQuantitative analysis
Quantitative analysis
Rajesh Mishra
 
Real Estate Data Set
Real Estate Data SetReal Estate Data Set
Real Estate Data Set
Sarah Jimenez
 
Research hypothesis....ppt
Research hypothesis....pptResearch hypothesis....ppt
Research hypothesis....ppt
Rahul Dhaker
 
Mr 4. quantitative research design and methods
Mr 4. quantitative research design and methodsMr 4. quantitative research design and methods
Mr 4. quantitative research design and methods
S'Roni Roni
 
Data analysis test for association BY Prof Sachin Udepurkar
Data analysis   test for association BY Prof Sachin UdepurkarData analysis   test for association BY Prof Sachin Udepurkar
Data analysis test for association BY Prof Sachin Udepurkar
sachinudepurkar
 
HYPOTHESES.pptx
HYPOTHESES.pptxHYPOTHESES.pptx
HYPOTHESES.pptx
TalhaKhan420569
 
Glm
GlmGlm
BUS308 Week 4 Lecture 1 Examining Relationships Expect.docx
BUS308 Week 4 Lecture 1 Examining Relationships Expect.docxBUS308 Week 4 Lecture 1 Examining Relationships Expect.docx
BUS308 Week 4 Lecture 1 Examining Relationships Expect.docx
curwenmichaela
 
BUS308 Week 4 Lecture 1 Examining Relationships Expect.docx
BUS308 Week 4 Lecture 1 Examining Relationships Expect.docxBUS308 Week 4 Lecture 1 Examining Relationships Expect.docx
BUS308 Week 4 Lecture 1 Examining Relationships Expect.docx
jasoninnes20
 
BUS308 Week 4 Lecture 1 Examining Relationships Expect.docx
BUS308 Week 4 Lecture 1 Examining Relationships Expect.docxBUS308 Week 4 Lecture 1 Examining Relationships Expect.docx
BUS308 Week 4 Lecture 1 Examining Relationships Expect.docx
richardnorman90310
 
Recep maz msb 701 quantitative analysis for managers
Recep maz msb 701 quantitative analysis for managersRecep maz msb 701 quantitative analysis for managers
Recep maz msb 701 quantitative analysis for managers
recepmaz
 
Recep maz msb 701 quantitative analysis for managers
Recep maz msb 701 quantitative analysis for managersRecep maz msb 701 quantitative analysis for managers
Recep maz msb 701 quantitative analysis for managers
recepmaz
 
REG.pptx
REG.pptxREG.pptx
REG.pptx
RaniT16
 

Similar to ML4 Regression.pptx (20)

Multicollinearity PPT
Multicollinearity PPTMulticollinearity PPT
Multicollinearity PPT
 
Multiple linear regression
Multiple linear regressionMultiple linear regression
Multiple linear regression
 
Artificial Intelligence (Unit - 8).pdf
Artificial Intelligence   (Unit  -  8).pdfArtificial Intelligence   (Unit  -  8).pdf
Artificial Intelligence (Unit - 8).pdf
 
MIKKAH MAE C. MANGAYAN- FS103.inferential-statisticspptx
MIKKAH MAE C. MANGAYAN- FS103.inferential-statisticspptxMIKKAH MAE C. MANGAYAN- FS103.inferential-statisticspptx
MIKKAH MAE C. MANGAYAN- FS103.inferential-statisticspptx
 
Data-Analysis.pptx
Data-Analysis.pptxData-Analysis.pptx
Data-Analysis.pptx
 
Correlations and t scores (2)
Correlations and t scores (2)Correlations and t scores (2)
Correlations and t scores (2)
 
Correlation and Regression.pdf
Correlation and Regression.pdfCorrelation and Regression.pdf
Correlation and Regression.pdf
 
Quantitative analysis
Quantitative analysisQuantitative analysis
Quantitative analysis
 
Real Estate Data Set
Real Estate Data SetReal Estate Data Set
Real Estate Data Set
 
Research hypothesis....ppt
Research hypothesis....pptResearch hypothesis....ppt
Research hypothesis....ppt
 
Mr 4. quantitative research design and methods
Mr 4. quantitative research design and methodsMr 4. quantitative research design and methods
Mr 4. quantitative research design and methods
 
Data analysis test for association BY Prof Sachin Udepurkar
Data analysis   test for association BY Prof Sachin UdepurkarData analysis   test for association BY Prof Sachin Udepurkar
Data analysis test for association BY Prof Sachin Udepurkar
 
HYPOTHESES.pptx
HYPOTHESES.pptxHYPOTHESES.pptx
HYPOTHESES.pptx
 
Glm
GlmGlm
Glm
 
BUS308 Week 4 Lecture 1 Examining Relationships Expect.docx
BUS308 Week 4 Lecture 1 Examining Relationships Expect.docxBUS308 Week 4 Lecture 1 Examining Relationships Expect.docx
BUS308 Week 4 Lecture 1 Examining Relationships Expect.docx
 
BUS308 Week 4 Lecture 1 Examining Relationships Expect.docx
BUS308 Week 4 Lecture 1 Examining Relationships Expect.docxBUS308 Week 4 Lecture 1 Examining Relationships Expect.docx
BUS308 Week 4 Lecture 1 Examining Relationships Expect.docx
 
BUS308 Week 4 Lecture 1 Examining Relationships Expect.docx
BUS308 Week 4 Lecture 1 Examining Relationships Expect.docxBUS308 Week 4 Lecture 1 Examining Relationships Expect.docx
BUS308 Week 4 Lecture 1 Examining Relationships Expect.docx
 
Recep maz msb 701 quantitative analysis for managers
Recep maz msb 701 quantitative analysis for managersRecep maz msb 701 quantitative analysis for managers
Recep maz msb 701 quantitative analysis for managers
 
Recep maz msb 701 quantitative analysis for managers
Recep maz msb 701 quantitative analysis for managersRecep maz msb 701 quantitative analysis for managers
Recep maz msb 701 quantitative analysis for managers
 
REG.pptx
REG.pptxREG.pptx
REG.pptx
 

Recently uploaded

bank management system in java and mysql report1.pdf
bank management system in java and mysql report1.pdfbank management system in java and mysql report1.pdf
bank management system in java and mysql report1.pdf
Divyam548318
 
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
University of Maribor
 
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdfIron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
RadiNasr
 
ML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptxML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptx
JamalHussainArman
 
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesHarnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Christina Lin
 
PPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testingPPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testing
anoopmanoharan2
 
ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024
Rahul
 
Heat Resistant Concrete Presentation ppt
Heat Resistant Concrete Presentation pptHeat Resistant Concrete Presentation ppt
Heat Resistant Concrete Presentation ppt
mamunhossenbd75
 
basic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdfbasic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdf
NidhalKahouli2
 
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
insn4465
 
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMSA SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
IJNSA Journal
 
A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...
nooriasukmaningtyas
 
14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application
SyedAbiiAzazi1
 
Swimming pool mechanical components design.pptx
Swimming pool  mechanical components design.pptxSwimming pool  mechanical components design.pptx
Swimming pool mechanical components design.pptx
yokeleetan1
 
Series of visio cisco devices Cisco_Icons.ppt
Series of visio cisco devices Cisco_Icons.pptSeries of visio cisco devices Cisco_Icons.ppt
Series of visio cisco devices Cisco_Icons.ppt
PauloRodrigues104553
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
thanhdowork
 
ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...
ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...
ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...
Mukeshwaran Balu
 
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsKuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
Victor Morales
 
Properties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptxProperties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptx
MDSABBIROJJAMANPAYEL
 
Wearable antenna for antenna applications
Wearable antenna for antenna applicationsWearable antenna for antenna applications
Wearable antenna for antenna applications
Madhumitha Jayaram
 

Recently uploaded (20)

bank management system in java and mysql report1.pdf
bank management system in java and mysql report1.pdfbank management system in java and mysql report1.pdf
bank management system in java and mysql report1.pdf
 
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
 
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdfIron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
 
ML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptxML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptx
 
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesHarnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
 
PPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testingPPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testing
 
ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024
 
Heat Resistant Concrete Presentation ppt
Heat Resistant Concrete Presentation pptHeat Resistant Concrete Presentation ppt
Heat Resistant Concrete Presentation ppt
 
basic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdfbasic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdf
 
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
 
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMSA SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
 
A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...
 
14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application
 
Swimming pool mechanical components design.pptx
Swimming pool  mechanical components design.pptxSwimming pool  mechanical components design.pptx
Swimming pool mechanical components design.pptx
 
Series of visio cisco devices Cisco_Icons.ppt
Series of visio cisco devices Cisco_Icons.pptSeries of visio cisco devices Cisco_Icons.ppt
Series of visio cisco devices Cisco_Icons.ppt
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
 
ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...
ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...
ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...
 
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsKuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
 
Properties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptxProperties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptx
 
Wearable antenna for antenna applications
Wearable antenna for antenna applicationsWearable antenna for antenna applications
Wearable antenna for antenna applications
 

ML4 Regression.pptx

  • 1. History of Regression • T H E T E R M R E G R E S S I O N WA S I N T R O D U C E D B Y F R A N C I S G A LTO N I N 1 8 8 6 I N H I S R E S E A R C H PA P E R “ FA M I LY L I K E N E S S I N S TAT U R E ” • H E O B S E R V E D T H AT TA L L PA R E N T S H AV E TA L L S O N S A N D S H O R T PA R E N T S H AV E S H O R T S O N S • H E C A M E U P W I T H T H E V I E W T H AT T H E H E I G H T O F T H E S O N S R E G R E S S “ R E V E R T B A C K ” TO WA R D S T H E AV E R A G E H E I G H T O F T H E P O P U L AT I O N Dr. Deepak Mehta Associate Professor AIT CSE
  • 2. Regression Introduction • Lets take a simple example : Suppose your manager asked you to predict annual sales. • There can be a hundred of factors (drivers) that affects sales • In this case, sales is your dependent variable. Factors affecting sales are independent variables. • Regression analysis would help you to solve this problem. • In simple words, regression analysis is used to model the relationship between a dependent variable and one or more independent variables. It helps us to answer the following questions – • Which of the drivers have a significant impact on sales. • Which is the most important driver of sales • How do the drivers interact with each other • What would be annual sale next year
  • 3. Regressing Terminologies Outliers Suppose there is an observation in the dataset which is having a very high or very low value as compared to the other observations in the data, i.e. it does not belong to the population, such an observation is called an outlier. In simple words, it is extreme value. An outlier is a problem because many times it hampers the results we get. Multicollinearity When the independent variables are highly correlated to each other then the variables are said to be multicollinear. Many types of regression techniques assumes multicollinearity should not be present in the dataset. It is because it causes problems in ranking variables based on its importance. Or it makes job difficult in selecting the most important independent variable (factor).
  • 4. Terminologies Heteroscedasticity When dependent variable’s variability is not equal across values of an independent variable, it is called heteroscedasticity. Example – As one’s income increases, the variability of food consumption will increase. A poorer person will spend a rather constant amount by always eating inexpensive food; a wealthier person may occasionally buy inexpensive food and at other times eat expensive meals. Those with higher incomes display a greater variability of food consumption. Underfitting and Overfitting When we use unnecessary explanatory variables it might lead to overfitting. Overfitting means that our algorithm works well on the training set but is unable to perform better on the test sets. It is also known as problem of high variance.
  • 5. History of Regression Galton’s law of universal regression was confirmed by his friend Karl Pearson, who collected data from more than 1000 families. He found that the average height of the sons of tall father is less than the their father’s height and average height of the sons of short father is more than their fathers’ height. Research paper of Karl Pearson “On law of inheritance, biometrika 1903
  • 6. Modern Regression It investigate the dependence of one variable, conventionally called dependent variable, on one or more variables called independent variable and provide an equation to be used for estimating or predicting the average of the dependent variable from the known values of independent variable.
  • 7. Modern Regression A variable whose variation we try to explain is dependent variable while the independent variable is a variable that is used to explain the variation in the dependent variable. When dependency of variable is studied upon one variable it is called simple linear regression or two variable regression While dependency upon two or more variable is studied in multiple regression
  • 8. Deterministic vs Probabilistic If there is a exact relationship b.w. variables s.t knowing value of one variable we can have unique value of other, such relation is known as Deterministic relationship or mathematical relation. e.g. consider y=a + bx. Substituting value of x we can have unique value of y Example of such relationship is F=32+(9/5)C, relationship b.w Fahrenheit and Celsius. Area of circle pi*r*r
  • 9. Deterministic vs Probabilistic If relationship b.w. variable is not exct, we cannot have precisely value of one variable by putting the value of the other, such relationship is known as probabilistic or stochastic or statistical relationship e.g. consider y=a + bX+e. substituting value of x we cannot have unique value of y, until we know e. where e is unknown random error. For ex. Consider relationship b.w. weight and height of person . All person with same height will not have same weight. In deterministic model all the point of order pairing of two variable (x,y) fall on the line of relationship.
  • 10. Dependent Variable Whereas in probabilistic model all the points of order pairing of two variable (x,y) donot fall on the line of relationship, there must be scattering around the line of relationship Dependent variable is random variable also called ◦ Explained variable ◦ Predictand ◦ Regressand ◦ Response ◦ Endogenous ◦ Outcome ◦ Controlled variable
  • 11. Independent variable The independent variable is fixed variable also called ◦ Explanatory variable ◦ Predictor ◦ Regressor ◦ Stimulus ◦ Exogenous ◦ Covariate ◦ Control variable
  • 12.
  • 13. Assumption of Linear Regression Linear Relationship Very low /no multicollinearity No heteroscedastic No autocorrelation b.w the errors Normal distribution of errors All observation are independent
  • 14. Summary LR is a ML algorithm based on Supervised Learning Regression model is a target prediction model based on independent variables Regression technique find out the relationship bw X(input) and Y(output) e.g. workexpSalary Hypothesis function of LR: y=W0+W1X When training this model : it fits the best line to predict the values of y for a given value of X
  • 15. .. •It shows the significant relationship b.w. label( dependent variable) and the features (independent variables) •It indicate the strength of impact of multiple independent variables on a dependent variable •Regression model is used to predict the continuous value •LR establishes a relationship b.w. dependent variable (y) and one or more independent variables(X) using best fit straight line (also known as regression line)
  • 16. There must be a linear relation between independent and dependent variables There should not be any outliers present No heteroscedasticity Sample observations should be independent Error terms should be normally distributed with mean 0 and constant variance Absence of multicollinearity and auto-correlation
  • 17. Interpretation of regression coefficients Let us consider an example where the dependent variable is marks obtained by a student and explanatory variables are number of hours studied and no. of classes attended. Suppose on fitting linear regression we got the linear regression as: Marks obtained = 5 + 2 (no. of hours studied) + 0.5(no. of classes attended) Thus we can have the regression coefficients 2 and 0.5 which can interpreted as: - If no. of hours studied and no. of classes are 0 then the student will obtain 5 marks. - Keeping no. of classes attended constant, if student studies for one hour more then he will score 2 more marks in the examination. - Similarly keeping no. of hours studied constant, if student attends one more class then he will attain 0.5 marks more.
  • 18. Least Squares As we know y=w0+w1X+error h(xi)+ei Error=yi-h(xi) Error is residual error in the observation Main aim is to minimize the residual error Squared error or cost function:
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24. Linear Regression Assignment Advertisement Dataset X = advertising['TV'] y = advertising['Sales’] ------------------------------------------------------- from sklearn.model_selection import train_test_split X_train, X_test, y_train,y_test=train_test_split(X, y, train_size = 0.7, test_size = 0.3, random_state = 100) X_train.head() y_train.head()