Generalized linear model

Generalized Linear Model
By
Rahul Narayanan

Agenda
 Refresher
 Definition of Generalized Linear Model
 What is a Normal Distribution
 What is a Linear Model
 Linear Modelling for Regression (Simple Linear Regression)
 Linear Modelling for Classification (Logistic Regression)
 Generalizing Linear Modelling of classification and Regression using GLM
 Some of GLMs

Refresher :
Types of ML :
Supervised Unsupervised Reinforcement
GLM
Classification
Regression
Response/output/Dependent variable
Categorical (or) discrete
Continuous
Example
• Yes/No
• Survived/Dead
• Lion/Tiger/Cheetah etc.
• 100.70
• 25
• -75.25
-∞ to +∞

Quiz :
Question 1 :
Classification
It will rain - 1
It will not rain - 0
Suppose you are working on a weather Prediction model, and you would like to predict whether or not it will be raining
At 5pm tomorrow
Is this a Classification or a Regression problem ?
Ans :

Quiz :
Question 2 :
Regression
Independent variable --> Experience in years
Dependent variable --> Salary
The HR department of an organization wants to have a salary prediction tool by which they want to decide on the salary
of a new employee based on his/her experience
Ans :

Quiz :
Question 3 :
Regression
Independent variable --> Weight of the car, Engine capacity
Dependent variable --> Mileage
Ans :
Weight of car
(kg)
Engine capacity
(Litre)
Mileage
(kmpl)
890 1.2 21
1200 1.6 19
920 2.2 15
700 1.0 22

Definition :
Random Component
The Generalized Linear Model expands the General Linear Model that allows Dependent variable to have a
linear relationship with the independent variable via a specified link function. Moreover the model allows for
the dependent variable to have a non-normal distribution.
There are three components to a GLM :
1.
Systematic Component2.
Link Function3.

Normal Distribution
Definition :
A Normal Distribution is an arrangement of dataset in which most of the values cluster in the middle(around
the mean) and the rest of the values falls away from the mean.
µ 1 σ 2 σ 3 σ-3 σ -2 σ -1 σ
68.2%
95.4%
99.7%
Height of human
Example
5.5
5.2 5.8
4.9 6.1
Salaries of Employees
4.6 6.4

Linear Model
Definition :
A Linear model is one in which a constant change in input/Independent variable results in a constant change
in output/Dependent variable.
X 1 3 5 7 9
Y 10 20 30 40 50
+2 +2 +2 +2
+10 +10 +10 +10
X 1 3 5 7 9
Y 4.8 10 15.3 20.2 25.3
+2 +2 +2 +2
+5.2 +5.3 +4.9 +5.1 ≈ 5

Linear Modelling
x y
0 0
1 2
2 4
3 6
4 8
5 ?10
y = 2x
0
2
4
6
8
10
12
0 1 2 3 4 5 6
Y
X
Equation of line:
y = mx + b
y = 2x + 0
slope
(+)ve
(-)ve
Infinite Slope
y-intercept
No Slope

Simple Linear Regression (Linear Modelling technique for Regression)
Meal # Tip amount ($)
1 5
2 17
3 11
4 8
5 14
6 5
Unfortunately, when you begin to look at your data,
you realize you only collected data for tip amount
and not the meal amount (total bill). So this is the
best data you have.
Problem:
As a Hotel owner you want to predict the tip amount($) of a meal for any given bill
amount. Therefore one evening you collect data for six meals.

Simple Linear Regression (contd)
Meal # Tip amount ($)
1 5
2 17
3 11
4 8
5 14
6 5
0
2
4
6
8
10
12
14
16
18
0 1 2 3 4 5 6 7
TIPAMOUNT
MEAL #
ȳ = 10
y = 0x + 10
+7
+1
+4
-5
-2
-5
Sum of squared error (SSE) = (-5) ² + 7² + 1² + (-2) ² + 4² + (-5) ²
= 120
best fit line

Total Bill Amount ($) Tip amount ($)
34 5
108 17
64 11
88 8
99 14
51 5
0
2
4
6
8
10
12
14
16
18
20 30 40 50 60 70 80 90 100 110 120
TIPAMOUNT
BILL AMOUNT
y = 0x + 10
y = 0x + 10

34 5
108 17
64 11
88 8
99 14
51 5
0
2
4
6
8
10
12
14
16
18
20 30 40 50 60 70 80 90 100 110 120
TIPAMOUNT
BILL AMOUNT
y = 0.08x + 6.2

34 5
108 17
64 11
88 8
99 14
51 5
0
2
4
6
8
10
12
14
16
18
20 30 40 50 60 70 80 90 100 110 120
TIPAMOUNT
BILL AMOUNT
y = 0.11x + 1.8

34 5
108 17
64 11
88 8
99 14
51 5
0
2
4
6
8
10
12
14
16
18
20 30 40 50 60 70 80 90 100 110 120
TIPAMOUNT
BILL AMOUNT
y = 0.14x – 0.81
SSE = 30.075• By Tuning the slope and intercept we make a best fit of line for our data
• How do you tune ? By using Gradient Descent Algorithm
Ho do we interpret y = 0.14x – 0.81

Logistic Regression (Linear Modelling technique for Classification)
Age in years Subscribed
18 0
22 0
27 1
31 1
24 0
42 1
Can I use the same technique of regression(fitting a
line) that we learned so far to solve this?
Problem:
We have collected a sample dataset of people’s age and whether they subscribed to
a magazine or not. Let’s come up with a model where given a persons’ age we have to
predict whether he will subscribe to the magazine or not.
Subscribed -1 Not Subscribed - 0
No
Why ?
• Data is categorical in nature
• Non-Normal Distribution [Binomial distribution]
• No linear relationship between age and subscription
But Let’s try

Logistic Regression
Age in years Subscribed
18 0
22 0
27 1
31 1
24 0
42 1 0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
10 15 20 25 30 35 40 45 50
SUBSCRIBED
AGE
y = mx + b
Age in years Probability (p)
18 0.23
22 0.30
27 0.72
31 0.81
24 0.29
42 0.88
38 1.47
17 -0.20
X
X
How do we solve
this ?

Trick Intuition
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
10 15 20 25 30 35 40 45 50
SUBSCRIBED
AGE
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
10 15 20 25 30 35 40 45 50
SUBSCRIBED
AGE

Trick 1
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
10 15 20 25 30 35 40 45 50
SUBSCRIBED
AGE
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
10 15 20 25 30 35 40 45 50
SUBSCRIBED
AGE
y = mx + b which ranges from -∞ to +∞
How do you ensure non - negativity of a number
• Absolute value of a number |-5|  +ve
• Squaring a number (-5) ²  +ve
• Exponential form of a number e⁻⁵  +ve
y = emx + b which ranges from 0 to +∞

Trick 2
How do you ensure any number to be <=1
• By dividing a number that is greater than it
5/(5+1) = 0.833  <=1
y = emx + b / 1 + emx + b which ranges from 0 to 1
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
10 15 20 25 30 35 40 45 50
SUBSCRIBED
AGE
y = emx + b which ranges from 0 to +∞
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
10 15 20 25 30 35 40 45 50
SUBSCRIBED
AGE
This is called a Sigmoid Function
E(Y) => P = emx + b / 1 + emx + b
0.5

Linear Model Constraint
• Normal Distribution
• E(Y) = mx + b
E(Y) = 0.14x – 0.81
• Binomial Distribution
• E(Y) = emx + b / 1 + emx + b
• E(Y) ≠ mx + b
i.e We cannot explain the prediction as a
Linear combination of Independent
variables
i.e We can explain the prediction as for
every $1 the bill amount increases, we
would expect the tip amount to increase
by $0.14 or about 15-cents
This is the most
important
constraint of a
Linear model
Linear Modelling technique for Regression
Linear Modelling technique for Classification

Framework for Generalization
Random Component
Systematic Component
Link Function
Explains the distribution of our
Dependent Variable
Explains Dependent variable as a
Linear combination of
Independent variable
Establishes Relationship
between Random &
Systematic component

Solve Linear Model Constraint using GLM
• Normal Distribution
• E(Y) = mx + b
E(Y) = 0.14x – 0.81
• Binomial Distribution
• E(Y) ≠ mx + b
• E(Y) = emx + b / 1 + emx + b
i.e We cannot explain the prediction as a
Linear combination of Independent
variables
i.e We can explain the prediction as for
every $1 the bill amount increases, we
would expect the tip amount to increase
by $0.14 or about 15-cents
Link Function
ɪ(E(Y)) = mx + b
Identity Function
Logit(E(Y)) = mx + b
Logit Function
Linear Modelling technique for Regression
Linear Modelling technique for Classification

Some of the Generalized Linear Models
 Logistic Regression
• Logit(E(Y)) = mx + b
 Probit Regression
• Probit(E(Y)) = mx + b
 Poisson Regression
• log(E(Y)) = mx + b
 Linear Regression
• E(Y) = mx + b
• ɪ(E(Y)) = mx + b

References
 http://www.statisticshowto.com/probability-and-statistics/normal-distributions/
 https://machinelearningmastery.com/simple-linear-regression-tutorial-for-machine-learning/
 https://www.analyticsvidhya.com/blog/2015/11/beginners-guide-on-logistic-regression-in-r/
 https://www.youtube.com/watch?v=zAULhNrnuL4
 https://www.youtube.com/watch?v=W3OaWyHEPv0

Generalized linear model

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Generalized linear model

Similar to Generalized linear model (20)

Recently uploaded

Recently uploaded (20)

Generalized linear model