SlideShare a Scribd company logo
1 of 28
Generalized Linear Model
By
Rahul Narayanan
Agenda
 Refresher
 Definition of Generalized Linear Model
 What is a Normal Distribution
 What is a Linear Model
 Linear Modelling for Regression (Simple Linear Regression)
 Linear Modelling for Classification (Logistic Regression)
 Generalizing Linear Modelling of classification and Regression using GLM
 Some of GLMs
Refresher :
Types of ML :
Supervised Unsupervised Reinforcement
GLM
Classification
Regression
Response/output/Dependent variable
Categorical (or) discrete
Continuous
Example
• Yes/No
• Survived/Dead
• Lion/Tiger/Cheetah etc.
• 100.70
• 25
• -75.25
-∞ to +∞
Quiz :
Question 1 :
Classification
It will rain - 1
It will not rain - 0
Suppose you are working on a weather Prediction model, and you would like to predict whether or not it will be raining
At 5pm tomorrow
Is this a Classification or a Regression problem ?
Ans :
Quiz :
Question 2 :
Regression
Independent variable --> Experience in years
Dependent variable --> Salary
The HR department of an organization wants to have a salary prediction tool by which they want to decide on the salary
of a new employee based on his/her experience
Is this a Classification or a Regression problem ?
Ans :
Quiz :
Question 3 :
Regression
Independent variable --> Weight of the car, Engine capacity
Dependent variable --> Mileage
Is this a Classification or a Regression problem ?
Ans :
Weight of car
(kg)
Engine capacity
(Litre)
Mileage
(kmpl)
890 1.2 21
1200 1.6 19
920 2.2 15
700 1.0 22
Generalized Linear Model
Definition :
Random Component
The Generalized Linear Model expands the General Linear Model that allows Dependent variable to have a
linear relationship with the independent variable via a specified link function. Moreover the model allows for
the dependent variable to have a non-normal distribution.
There are three components to a GLM :
1.
Systematic Component2.
Link Function3.
Normal Distribution
Definition :
A Normal Distribution is an arrangement of dataset in which most of the values cluster in the middle(around
the mean) and the rest of the values falls away from the mean.
µ 1 σ 2 σ 3 σ-3 σ -2 σ -1 σ
68.2%
95.4%
99.7%
Height of human
Example
5.5
5.2 5.8
4.9 6.1
Salaries of Employees
4.6 6.4
Linear Model
Definition :
A Linear model is one in which a constant change in input/Independent variable results in a constant change
in output/Dependent variable.
X 1 3 5 7 9
Y 10 20 30 40 50
+2 +2 +2 +2
+10 +10 +10 +10
X 1 3 5 7 9
Y 4.8 10 15.3 20.2 25.3
+2 +2 +2 +2
+5.2 +5.3 +4.9 +5.1 ≈ 5
Linear Modelling
x y
0 0
1 2
2 4
3 6
4 8
5 ?10
y = 2x
0
2
4
6
8
10
12
0 1 2 3 4 5 6
Y
X
Equation of line:
y = mx + b
y = 2x + 0
slope
(+)ve
(-)ve
Infinite Slope
y-intercept
No Slope
Simple Linear Regression (Linear Modelling technique for Regression)
Meal # Tip amount ($)
1 5
2 17
3 11
4 8
5 14
6 5
Unfortunately, when you begin to look at your data,
you realize you only collected data for tip amount
and not the meal amount (total bill). So this is the
best data you have.
Problem:
As a Hotel owner you want to predict the tip amount($) of a meal for any given bill
amount. Therefore one evening you collect data for six meals.
Simple Linear Regression (contd)
Meal # Tip amount ($)
1 5
2 17
3 11
4 8
5 14
6 5
0
2
4
6
8
10
12
14
16
18
0 1 2 3 4 5 6 7
TIPAMOUNT
MEAL #
ȳ = 10
y = 0x + 10
+7
+1
+4
-5
-2
-5
Sum of squared error (SSE) = (-5) ² + 7² + 1² + (-2) ² + 4² + (-5) ²
= 120
best fit line
Simple Linear Regression (contd)
Total Bill Amount ($) Tip amount ($)
34 5
108 17
64 11
88 8
99 14
51 5
0
2
4
6
8
10
12
14
16
18
20 30 40 50 60 70 80 90 100 110 120
TIPAMOUNT
BILL AMOUNT
y = 0x + 10
y = 0x + 10
Simple Linear Regression (contd)
Total Bill Amount ($) Tip amount ($)
34 5
108 17
64 11
88 8
99 14
51 5
0
2
4
6
8
10
12
14
16
18
20 30 40 50 60 70 80 90 100 110 120
TIPAMOUNT
BILL AMOUNT
y = 0.08x + 6.2
Simple Linear Regression (contd)
Total Bill Amount ($) Tip amount ($)
34 5
108 17
64 11
88 8
99 14
51 5
0
2
4
6
8
10
12
14
16
18
20 30 40 50 60 70 80 90 100 110 120
TIPAMOUNT
BILL AMOUNT
y = 0.11x + 1.8
Simple Linear Regression (contd)
Total Bill Amount ($) Tip amount ($)
34 5
108 17
64 11
88 8
99 14
51 5
0
2
4
6
8
10
12
14
16
18
20 30 40 50 60 70 80 90 100 110 120
TIPAMOUNT
BILL AMOUNT
y = 0.14x – 0.81
SSE = 30.075• By Tuning the slope and intercept we make a best fit of line for our data
• How do you tune ? By using Gradient Descent Algorithm
Ho do we interpret y = 0.14x – 0.81
Logistic Regression (Linear Modelling technique for Classification)
Age in years Subscribed
18 0
22 0
27 1
31 1
24 0
42 1
Can I use the same technique of regression(fitting a
line) that we learned so far to solve this?
Problem:
We have collected a sample dataset of people’s age and whether they subscribed to
a magazine or not. Let’s come up with a model where given a persons’ age we have to
predict whether he will subscribe to the magazine or not.
Subscribed -1 Not Subscribed - 0
No
Why ?
• Data is categorical in nature
• Non-Normal Distribution [Binomial distribution]
• No linear relationship between age and subscription
But Let’s try
Logistic Regression
Age in years Subscribed
18 0
22 0
27 1
31 1
24 0
42 1 0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
10 15 20 25 30 35 40 45 50
SUBSCRIBED
AGE
y = mx + b
Age in years Probability (p)
18 0.23
22 0.30
27 0.72
31 0.81
24 0.29
42 0.88
38 1.47
17 -0.20
X
X
How do we solve
this ?
Trick Intuition
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
10 15 20 25 30 35 40 45 50
SUBSCRIBED
AGE
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
10 15 20 25 30 35 40 45 50
SUBSCRIBED
AGE
Trick 1
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
10 15 20 25 30 35 40 45 50
SUBSCRIBED
AGE
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
10 15 20 25 30 35 40 45 50
SUBSCRIBED
AGE
y = mx + b which ranges from -∞ to +∞
How do you ensure non - negativity of a number
• Absolute value of a number |-5|  +ve
• Squaring a number (-5) ²  +ve
• Exponential form of a number e⁻⁵  +ve
y = emx + b which ranges from 0 to +∞
Trick 2
How do you ensure any number to be <=1
• By dividing a number that is greater than it
5/(5+1) = 0.833  <=1
y = emx + b / 1 + emx + b which ranges from 0 to 1
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
10 15 20 25 30 35 40 45 50
SUBSCRIBED
AGE
y = emx + b which ranges from 0 to +∞
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
10 15 20 25 30 35 40 45 50
SUBSCRIBED
AGE
This is called a Sigmoid Function
E(Y) => P = emx + b / 1 + emx + b
0.5
Linear Model Constraint
• Normal Distribution
• E(Y) = mx + b
E(Y) = 0.14x – 0.81
• Binomial Distribution
• E(Y) = emx + b / 1 + emx + b
• E(Y) ≠ mx + b
i.e We cannot explain the prediction as a
Linear combination of Independent
variables
i.e We can explain the prediction as for
every $1 the bill amount increases, we
would expect the tip amount to increase
by $0.14 or about 15-cents
This is the most
important
constraint of a
Linear model
Linear Modelling technique for Regression
Linear Modelling technique for Classification
Generalized Linear Model
Framework for Generalization
Random Component
Systematic Component
Link Function
Explains the distribution of our
Dependent Variable
Explains Dependent variable as a
Linear combination of
Independent variable
Establishes Relationship
between Random &
Systematic component
Solve Linear Model Constraint using GLM
• Normal Distribution
• E(Y) = mx + b
E(Y) = 0.14x – 0.81
• Binomial Distribution
• E(Y) ≠ mx + b
• E(Y) = emx + b / 1 + emx + b
i.e We cannot explain the prediction as a
Linear combination of Independent
variables
i.e We can explain the prediction as for
every $1 the bill amount increases, we
would expect the tip amount to increase
by $0.14 or about 15-cents
Link Function
ɪ(E(Y)) = mx + b
Identity Function
Logit(E(Y)) = mx + b
Logit Function
Linear Modelling technique for Regression
Linear Modelling technique for Classification
Generalized Linear Model
Definition :
Random Component
The Generalized Linear Model expands the General Linear Model that allows Dependent variable to have a
linear relationship with the independent variable via a specified link function. Moreover the model allows for
the dependent variable to have a non-normal distribution.
There are three components to a GLM :
1.
Systematic Component2.
Link Function3.
Some of the Generalized Linear Models
 Logistic Regression
• Logit(E(Y)) = mx + b
 Probit Regression
• Probit(E(Y)) = mx + b
 Poisson Regression
• log(E(Y)) = mx + b
 Linear Regression
• E(Y) = mx + b
• ɪ(E(Y)) = mx + b
References
 http://www.statisticshowto.com/probability-and-statistics/normal-distributions/
 https://machinelearningmastery.com/simple-linear-regression-tutorial-for-machine-learning/
 https://www.analyticsvidhya.com/blog/2015/11/beginners-guide-on-logistic-regression-in-r/
 https://www.youtube.com/watch?v=zAULhNrnuL4
 https://www.youtube.com/watch?v=W3OaWyHEPv0
Thank You

More Related Content

What's hot

Logistic regression
Logistic regressionLogistic regression
Logistic regressionDrZahid Khan
 
What is Binary Logistic Regression Classification and How is it Used in Analy...
What is Binary Logistic Regression Classification and How is it Used in Analy...What is Binary Logistic Regression Classification and How is it Used in Analy...
What is Binary Logistic Regression Classification and How is it Used in Analy...Smarten Augmented Analytics
 
Multivariate data analysis
Multivariate data analysisMultivariate data analysis
Multivariate data analysisSetia Pramana
 
Logistic regression
Logistic regressionLogistic regression
Logistic regressionVARUN KUMAR
 
Simple Linier Regression
Simple Linier RegressionSimple Linier Regression
Simple Linier Regressiondessybudiyanti
 
Multivariate Analysis Techniques
Multivariate Analysis TechniquesMultivariate Analysis Techniques
Multivariate Analysis TechniquesMehul Gondaliya
 
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic NetsData Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic NetsDerek Kane
 
Regression analysis
Regression analysisRegression analysis
Regression analysisSrikant001p
 
Regression analysis
Regression analysisRegression analysis
Regression analysissaba khan
 
7. logistics regression using spss
7. logistics regression using spss7. logistics regression using spss
7. logistics regression using spssDr Nisha Arora
 
Logistic regression with SPSS examples
Logistic regression with SPSS examplesLogistic regression with SPSS examples
Logistic regression with SPSS examplesGaurav Kamboj
 
Regression analysis
Regression analysisRegression analysis
Regression analysisRavi shankar
 

What's hot (20)

Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
 
What is Binary Logistic Regression Classification and How is it Used in Analy...
What is Binary Logistic Regression Classification and How is it Used in Analy...What is Binary Logistic Regression Classification and How is it Used in Analy...
What is Binary Logistic Regression Classification and How is it Used in Analy...
 
Poisson regression models for count data
Poisson regression models for count dataPoisson regression models for count data
Poisson regression models for count data
 
Multivariate data analysis
Multivariate data analysisMultivariate data analysis
Multivariate data analysis
 
Regression
RegressionRegression
Regression
 
Regression Analysis
Regression AnalysisRegression Analysis
Regression Analysis
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
Sampling Distribution
Sampling DistributionSampling Distribution
Sampling Distribution
 
Simple Linier Regression
Simple Linier RegressionSimple Linier Regression
Simple Linier Regression
 
Multivariate Analysis Techniques
Multivariate Analysis TechniquesMultivariate Analysis Techniques
Multivariate Analysis Techniques
 
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic NetsData Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
 
Correlation
CorrelationCorrelation
Correlation
 
Multivariate analysis
Multivariate analysisMultivariate analysis
Multivariate analysis
 
7. logistics regression using spss
7. logistics regression using spss7. logistics regression using spss
7. logistics regression using spss
 
Multiple regression
Multiple regressionMultiple regression
Multiple regression
 
Logistic regression with SPSS examples
Logistic regression with SPSS examplesLogistic regression with SPSS examples
Logistic regression with SPSS examples
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
 

Similar to Generalized linear model

A Study on the Short Run Relationship b/w Major Economic Indicators of US Eco...
A Study on the Short Run Relationship b/w Major Economic Indicators of US Eco...A Study on the Short Run Relationship b/w Major Economic Indicators of US Eco...
A Study on the Short Run Relationship b/w Major Economic Indicators of US Eco...aurkoiitk
 
Ch 6 Slides.doc/9929292929292919299292@:&:&:&9/92
Ch 6 Slides.doc/9929292929292919299292@:&:&:&9/92Ch 6 Slides.doc/9929292929292919299292@:&:&:&9/92
Ch 6 Slides.doc/9929292929292919299292@:&:&:&9/92ohenebabismark508
 
Logic design basics
Logic design basicsLogic design basics
Logic design basicsharishnn
 
Logistic Regression.ppt
Logistic Regression.pptLogistic Regression.ppt
Logistic Regression.ppthabtamu biazin
 
Curve_Fitting.pdf
Curve_Fitting.pdfCurve_Fitting.pdf
Curve_Fitting.pdfIrfan Khan
 
Regression Analysis.pptx
Regression Analysis.pptxRegression Analysis.pptx
Regression Analysis.pptxMdRokonMia1
 
Lesson 27 using statistical techniques in analyzing data
Lesson 27 using statistical techniques in analyzing dataLesson 27 using statistical techniques in analyzing data
Lesson 27 using statistical techniques in analyzing datamjlobetos
 
Solution manual for essentials of business analytics 1st editor
Solution manual for essentials of business analytics 1st editorSolution manual for essentials of business analytics 1st editor
Solution manual for essentials of business analytics 1st editorvados ji
 
Simple Regression Years with Midwest and Shelf Space Winter .docx
Simple Regression Years with Midwest and Shelf Space Winter .docxSimple Regression Years with Midwest and Shelf Space Winter .docx
Simple Regression Years with Midwest and Shelf Space Winter .docxbudabrooks46239
 
04. Growth_Rate_AND_Asymptotic Notations_.pptx
04. Growth_Rate_AND_Asymptotic Notations_.pptx04. Growth_Rate_AND_Asymptotic Notations_.pptx
04. Growth_Rate_AND_Asymptotic Notations_.pptxarslanzaheer14
 
Comm5005 lecture 4
Comm5005 lecture 4Comm5005 lecture 4
Comm5005 lecture 4blinking1
 
Data types - things you have to know!
Data types - things you have to know!Data types - things you have to know!
Data types - things you have to know!Karol Sobiesiak
 

Similar to Generalized linear model (20)

A Study on the Short Run Relationship b/w Major Economic Indicators of US Eco...
A Study on the Short Run Relationship b/w Major Economic Indicators of US Eco...A Study on the Short Run Relationship b/w Major Economic Indicators of US Eco...
A Study on the Short Run Relationship b/w Major Economic Indicators of US Eco...
 
Chapter 5
Chapter 5Chapter 5
Chapter 5
 
Regression
RegressionRegression
Regression
 
Static Models of Continuous Variables
Static Models of Continuous VariablesStatic Models of Continuous Variables
Static Models of Continuous Variables
 
Ch 6 Slides.doc/9929292929292919299292@:&:&:&9/92
Ch 6 Slides.doc/9929292929292919299292@:&:&:&9/92Ch 6 Slides.doc/9929292929292919299292@:&:&:&9/92
Ch 6 Slides.doc/9929292929292919299292@:&:&:&9/92
 
01_SLR_final (1).pptx
01_SLR_final (1).pptx01_SLR_final (1).pptx
01_SLR_final (1).pptx
 
Logic design basics
Logic design basicsLogic design basics
Logic design basics
 
Logistic Regression.ppt
Logistic Regression.pptLogistic Regression.ppt
Logistic Regression.ppt
 
Curve_Fitting.pdf
Curve_Fitting.pdfCurve_Fitting.pdf
Curve_Fitting.pdf
 
Ch15
Ch15Ch15
Ch15
 
Regression Analysis.pptx
Regression Analysis.pptxRegression Analysis.pptx
Regression Analysis.pptx
 
ERF Training Workshop Panel Data 3
ERF Training WorkshopPanel Data 3ERF Training WorkshopPanel Data 3
ERF Training Workshop Panel Data 3
 
Statistics For Management 3 October
Statistics For Management 3 OctoberStatistics For Management 3 October
Statistics For Management 3 October
 
Lesson 27 using statistical techniques in analyzing data
Lesson 27 using statistical techniques in analyzing dataLesson 27 using statistical techniques in analyzing data
Lesson 27 using statistical techniques in analyzing data
 
Solution manual for essentials of business analytics 1st editor
Solution manual for essentials of business analytics 1st editorSolution manual for essentials of business analytics 1st editor
Solution manual for essentials of business analytics 1st editor
 
Simple Regression Years with Midwest and Shelf Space Winter .docx
Simple Regression Years with Midwest and Shelf Space Winter .docxSimple Regression Years with Midwest and Shelf Space Winter .docx
Simple Regression Years with Midwest and Shelf Space Winter .docx
 
Regression
RegressionRegression
Regression
 
04. Growth_Rate_AND_Asymptotic Notations_.pptx
04. Growth_Rate_AND_Asymptotic Notations_.pptx04. Growth_Rate_AND_Asymptotic Notations_.pptx
04. Growth_Rate_AND_Asymptotic Notations_.pptx
 
Comm5005 lecture 4
Comm5005 lecture 4Comm5005 lecture 4
Comm5005 lecture 4
 
Data types - things you have to know!
Data types - things you have to know!Data types - things you have to know!
Data types - things you have to know!
 

Recently uploaded

SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 

Recently uploaded (20)

SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 

Generalized linear model

  • 2. Agenda  Refresher  Definition of Generalized Linear Model  What is a Normal Distribution  What is a Linear Model  Linear Modelling for Regression (Simple Linear Regression)  Linear Modelling for Classification (Logistic Regression)  Generalizing Linear Modelling of classification and Regression using GLM  Some of GLMs
  • 3. Refresher : Types of ML : Supervised Unsupervised Reinforcement GLM Classification Regression Response/output/Dependent variable Categorical (or) discrete Continuous Example • Yes/No • Survived/Dead • Lion/Tiger/Cheetah etc. • 100.70 • 25 • -75.25 -∞ to +∞
  • 4. Quiz : Question 1 : Classification It will rain - 1 It will not rain - 0 Suppose you are working on a weather Prediction model, and you would like to predict whether or not it will be raining At 5pm tomorrow Is this a Classification or a Regression problem ? Ans :
  • 5. Quiz : Question 2 : Regression Independent variable --> Experience in years Dependent variable --> Salary The HR department of an organization wants to have a salary prediction tool by which they want to decide on the salary of a new employee based on his/her experience Is this a Classification or a Regression problem ? Ans :
  • 6. Quiz : Question 3 : Regression Independent variable --> Weight of the car, Engine capacity Dependent variable --> Mileage Is this a Classification or a Regression problem ? Ans : Weight of car (kg) Engine capacity (Litre) Mileage (kmpl) 890 1.2 21 1200 1.6 19 920 2.2 15 700 1.0 22
  • 7. Generalized Linear Model Definition : Random Component The Generalized Linear Model expands the General Linear Model that allows Dependent variable to have a linear relationship with the independent variable via a specified link function. Moreover the model allows for the dependent variable to have a non-normal distribution. There are three components to a GLM : 1. Systematic Component2. Link Function3.
  • 8. Normal Distribution Definition : A Normal Distribution is an arrangement of dataset in which most of the values cluster in the middle(around the mean) and the rest of the values falls away from the mean. µ 1 σ 2 σ 3 σ-3 σ -2 σ -1 σ 68.2% 95.4% 99.7% Height of human Example 5.5 5.2 5.8 4.9 6.1 Salaries of Employees 4.6 6.4
  • 9. Linear Model Definition : A Linear model is one in which a constant change in input/Independent variable results in a constant change in output/Dependent variable. X 1 3 5 7 9 Y 10 20 30 40 50 +2 +2 +2 +2 +10 +10 +10 +10 X 1 3 5 7 9 Y 4.8 10 15.3 20.2 25.3 +2 +2 +2 +2 +5.2 +5.3 +4.9 +5.1 ≈ 5
  • 10. Linear Modelling x y 0 0 1 2 2 4 3 6 4 8 5 ?10 y = 2x 0 2 4 6 8 10 12 0 1 2 3 4 5 6 Y X Equation of line: y = mx + b y = 2x + 0 slope (+)ve (-)ve Infinite Slope y-intercept No Slope
  • 11. Simple Linear Regression (Linear Modelling technique for Regression) Meal # Tip amount ($) 1 5 2 17 3 11 4 8 5 14 6 5 Unfortunately, when you begin to look at your data, you realize you only collected data for tip amount and not the meal amount (total bill). So this is the best data you have. Problem: As a Hotel owner you want to predict the tip amount($) of a meal for any given bill amount. Therefore one evening you collect data for six meals.
  • 12. Simple Linear Regression (contd) Meal # Tip amount ($) 1 5 2 17 3 11 4 8 5 14 6 5 0 2 4 6 8 10 12 14 16 18 0 1 2 3 4 5 6 7 TIPAMOUNT MEAL # ȳ = 10 y = 0x + 10 +7 +1 +4 -5 -2 -5 Sum of squared error (SSE) = (-5) ² + 7² + 1² + (-2) ² + 4² + (-5) ² = 120 best fit line
  • 13. Simple Linear Regression (contd) Total Bill Amount ($) Tip amount ($) 34 5 108 17 64 11 88 8 99 14 51 5 0 2 4 6 8 10 12 14 16 18 20 30 40 50 60 70 80 90 100 110 120 TIPAMOUNT BILL AMOUNT y = 0x + 10 y = 0x + 10
  • 14. Simple Linear Regression (contd) Total Bill Amount ($) Tip amount ($) 34 5 108 17 64 11 88 8 99 14 51 5 0 2 4 6 8 10 12 14 16 18 20 30 40 50 60 70 80 90 100 110 120 TIPAMOUNT BILL AMOUNT y = 0.08x + 6.2
  • 15. Simple Linear Regression (contd) Total Bill Amount ($) Tip amount ($) 34 5 108 17 64 11 88 8 99 14 51 5 0 2 4 6 8 10 12 14 16 18 20 30 40 50 60 70 80 90 100 110 120 TIPAMOUNT BILL AMOUNT y = 0.11x + 1.8
  • 16. Simple Linear Regression (contd) Total Bill Amount ($) Tip amount ($) 34 5 108 17 64 11 88 8 99 14 51 5 0 2 4 6 8 10 12 14 16 18 20 30 40 50 60 70 80 90 100 110 120 TIPAMOUNT BILL AMOUNT y = 0.14x – 0.81 SSE = 30.075• By Tuning the slope and intercept we make a best fit of line for our data • How do you tune ? By using Gradient Descent Algorithm Ho do we interpret y = 0.14x – 0.81
  • 17. Logistic Regression (Linear Modelling technique for Classification) Age in years Subscribed 18 0 22 0 27 1 31 1 24 0 42 1 Can I use the same technique of regression(fitting a line) that we learned so far to solve this? Problem: We have collected a sample dataset of people’s age and whether they subscribed to a magazine or not. Let’s come up with a model where given a persons’ age we have to predict whether he will subscribe to the magazine or not. Subscribed -1 Not Subscribed - 0 No Why ? • Data is categorical in nature • Non-Normal Distribution [Binomial distribution] • No linear relationship between age and subscription But Let’s try
  • 18. Logistic Regression Age in years Subscribed 18 0 22 0 27 1 31 1 24 0 42 1 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 10 15 20 25 30 35 40 45 50 SUBSCRIBED AGE y = mx + b Age in years Probability (p) 18 0.23 22 0.30 27 0.72 31 0.81 24 0.29 42 0.88 38 1.47 17 -0.20 X X How do we solve this ?
  • 19. Trick Intuition 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 10 15 20 25 30 35 40 45 50 SUBSCRIBED AGE 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 10 15 20 25 30 35 40 45 50 SUBSCRIBED AGE
  • 20. Trick 1 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 10 15 20 25 30 35 40 45 50 SUBSCRIBED AGE 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 10 15 20 25 30 35 40 45 50 SUBSCRIBED AGE y = mx + b which ranges from -∞ to +∞ How do you ensure non - negativity of a number • Absolute value of a number |-5|  +ve • Squaring a number (-5) ²  +ve • Exponential form of a number e⁻⁵  +ve y = emx + b which ranges from 0 to +∞
  • 21. Trick 2 How do you ensure any number to be <=1 • By dividing a number that is greater than it 5/(5+1) = 0.833  <=1 y = emx + b / 1 + emx + b which ranges from 0 to 1 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 10 15 20 25 30 35 40 45 50 SUBSCRIBED AGE y = emx + b which ranges from 0 to +∞ 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 10 15 20 25 30 35 40 45 50 SUBSCRIBED AGE This is called a Sigmoid Function E(Y) => P = emx + b / 1 + emx + b 0.5
  • 22. Linear Model Constraint • Normal Distribution • E(Y) = mx + b E(Y) = 0.14x – 0.81 • Binomial Distribution • E(Y) = emx + b / 1 + emx + b • E(Y) ≠ mx + b i.e We cannot explain the prediction as a Linear combination of Independent variables i.e We can explain the prediction as for every $1 the bill amount increases, we would expect the tip amount to increase by $0.14 or about 15-cents This is the most important constraint of a Linear model Linear Modelling technique for Regression Linear Modelling technique for Classification
  • 23. Generalized Linear Model Framework for Generalization Random Component Systematic Component Link Function Explains the distribution of our Dependent Variable Explains Dependent variable as a Linear combination of Independent variable Establishes Relationship between Random & Systematic component
  • 24. Solve Linear Model Constraint using GLM • Normal Distribution • E(Y) = mx + b E(Y) = 0.14x – 0.81 • Binomial Distribution • E(Y) ≠ mx + b • E(Y) = emx + b / 1 + emx + b i.e We cannot explain the prediction as a Linear combination of Independent variables i.e We can explain the prediction as for every $1 the bill amount increases, we would expect the tip amount to increase by $0.14 or about 15-cents Link Function ɪ(E(Y)) = mx + b Identity Function Logit(E(Y)) = mx + b Logit Function Linear Modelling technique for Regression Linear Modelling technique for Classification
  • 25. Generalized Linear Model Definition : Random Component The Generalized Linear Model expands the General Linear Model that allows Dependent variable to have a linear relationship with the independent variable via a specified link function. Moreover the model allows for the dependent variable to have a non-normal distribution. There are three components to a GLM : 1. Systematic Component2. Link Function3.
  • 26. Some of the Generalized Linear Models  Logistic Regression • Logit(E(Y)) = mx + b  Probit Regression • Probit(E(Y)) = mx + b  Poisson Regression • log(E(Y)) = mx + b  Linear Regression • E(Y) = mx + b • ɪ(E(Y)) = mx + b
  • 27. References  http://www.statisticshowto.com/probability-and-statistics/normal-distributions/  https://machinelearningmastery.com/simple-linear-regression-tutorial-for-machine-learning/  https://www.analyticsvidhya.com/blog/2015/11/beginners-guide-on-logistic-regression-in-r/  https://www.youtube.com/watch?v=zAULhNrnuL4  https://www.youtube.com/watch?v=W3OaWyHEPv0