Logit and Probit and Tobit model: Basic Introduction

What is Regression Analysis ?
• Technique of estimating the unknown value of
dependent variable from the known value of
independent variable is called regression analysis.
Eg : The effect of a price increase upon demand, or
the effect of changes in the money supply upon the
inflation rate
2

Regression Lines
A regression line is a line that best describes the
linear relationship between the two variables.
y = a + bx
3
a
Y=a+bX
Y=a-bX
X Axis
Y Axis

Assumptions for regression
Measurement :
• All independent variables –interval/ratio/dichotomous
• Dependent variable- interval/ratio
Specification :
• Linear relationship between dependent and
independent
Expected value of error term : zero
4

Homoscedasticity
• Variance of error term is same/ constant
Normality of error
• Normally distributed for each set of values of
independent variable
Absence of multicollinearity
Assumptions for regression
5

Limitations of linear regression
Violation of measurements
• Dependent variable : if it is dichotomous
eg.: Smoker and non-smoker
Adopter and non-adopter
Participating and non-participating
• Independent variable: if any of the IV is dichotomous
• Eg: male and female
6

Shall we use LPM ???...
yes but…
• Non-normality of the errors Ui
•Hetroscedastic variances of the errors
• Non fulfillment of 0 < E (Yi|Xi) < 1
•Questionable of value of R2 as a measure
of goodness of fit
7

Presentation
on
Logit, Probit and tobit Model
Rabeesh Kumar Verma
Roll no : 10756
Division of Agricultural Extension,ICAR- IARI
9

What is logistic regression ?
Used to analyze relationships between a dichotomous
dependent variable and metric or dichotomous
independent variables
Combines the independent variables to estimate the
probability that a particular event will occur or not
 LR is a nonlinear regression model that forces the output
(predicted values) to be either 0 or 1
It could be called a qualitative response/discrete choice
model in the terminology of economics
10

Assumptions:
• NO NEED a linear relationship between the dependent and
• NO NEED- Homoscedasticity of independent variables
• The error terms need to be independent
• It requires quite large sample sizes
• Absence of perfect multicollinearity
NO NEED -normality, linearity, and homogeneity of variance for the
11

Feature of logit model:
• As P goes from 0 to 1 the logit L goes from -∞ to +∞.
That is although probabilities lie between 0 to 1,logits are
not so bounded.
• L is linear in X, the probabilities themselves are not,
which is in contrast with LPM model where probabilities
increases linearly with X.
• If L, the logit is positive, it means that when the value of
the regressor (s) increases the odds that the regressand
equals to 1 increases and vice versa.
13

Level of measurement:
• Logistic regression analysis requires that the dependent
variable be dichotomous.
• Logistic regression analysis requires that the
independent variables be metric or dichotomous.
• If an independent variable is nominal level and not
dichotomous, the logistic regression procedure in SPSS
has a option to dummy code the variable.
• If an independent variable is ordinal, we will attach the
usual caution.
14

Variables in logistic regression:
• In typical logistic regression analysis there will always be
one dependent (dichotomous) and
• Usually set of independent variable that may be either
dichotomous or quantitative or some combination .
15

The minimum number of cases per independent
variable is 10
(Hosmer and Lemeshow , Applied Logistic Regression )
For example-
If we are using 8 independent variables, then
minimum sample size should be = 8 x 10= 80
Sample size: Logit model and Probit
model
16

Logistics regression equation
Ln (Pi / (1-Pi)= a + b1x1 +b2x2+….+bnXn
Where,
Pi= probability of happening of event
eg: adoption of technology
(1-Pi) = probability of not happening of the event
eg: non-adoption of technology
X1, X2….Xn= independent variables
b1, b2…bn= regression coefficients
a= constant (intercept)
17

Example :
Dependent variable Adoption / Non-adoption
Independent variables Description Hypothesized relation
Age Chronological years of
farmers
-
Education No of years of formal
schooling
+
Land holding Farm size measured acres +
Access to training Yes=1 / no=0 +
Distance to market In kilometers -
Access to credit Yes=1 / no=0 +
Extension services Yes=1 / no=0 +
18

Logit :predicted possibilities.

Case 1:
A Logit Analysis of Bt Cotton Adoption and Assessment
of Farmers’ Training Need
Padaria, et al., 200924

Contd…
Padaria, et al.,
2009
B = regression coefficient
Used to predict whether or not an independent variable would be significant in the
model.
degrees of freedom for the Wald chi- square test,
Are the standard errors associated with the coefficients
Wald chi square value and 2tailed p value used in testing the null hypothesis that the
coefficient (parameter) is 0
Exp(B) the exponentiation of the B coefficient, which is an odds ratio. This value is
given by default because odds ratios can be easier to interpret than the coefficient
25

Advantages of logit model:
Transformation of a dependent dichotomous
dependent variable into continuous variable
Results - easily interpretable
simple to analyse method.
It gives parameter estimates- asymptotically
consistent, efficient and normal, so that the analogue
by the regression t-test can be applied.
26

Limitation:
• As in case of logit probility model, the disturbance term in
logit model hetroscedasticity and therefore we should go
for weighted least squares.
• As in many other regression , there may be problem of
multicollinearity if the explanatory variable are related
among themselves
27

Application of logit model:
1.It can be used to identify the factors that affects the adoption of
particular technology say, use of new crop varities, fertilizers,
pesticides etc on the farm.
2.Model used extensively to analyzing growth phenomena such as
population, GNP, money supply etc.
3.In field of marketing it can be used for brand preferences and brand
loyalty for a brand
4.Gender studies can be used logit analysis to find out factors which
affect the decision making status of men and women in family
28

Probit regression model:
• Probit model is a type of regression where the dependent
variable can only take two values, for example adoption or
non-adoption, married or not married.
• The purpose of the model is to estimate the probability
• Estimating model that emerge from normal cumulative
distribution function (CDF) is popularly known as probit
model
• Sometimes it is also called as normit model.
29

Probit :Level of measurement
requirements
• Dependent variable = dichotomous/categorical
• Eg: adoption and non adoption,
participation and non- participation
• Independent variables be metric or dichotomous
• Eg: age-ratio level data
• Gender- male/female(dichotomous)
30

Case 2 : Factors Affecting Adoption of Improved Rice Varieties
among Rural Farm Households in Central Nepal
Ghimire (2015 ) (Published in : Rice Science)
31

Difference b/w
Logit and Probit model:
Logit Probit
Slightly flatter tails The conditional probability Pi
approaches 0 or 1 at a faster rate
Basis of logit model is standard
logistic distribution
Basis of probit model is standard
normal distribution
Variance = Π2 / 3 Variance = 1
Simple mathematics Sophisticated mathematics
Both give same result, preference of the method depends on the researcher choice
but logit regression is mostly preferered
33

Significance of Wald test
•To test Statistical significance of unique
contribution of each coefficient in the
model
•This test is similar to the t test in the
multiple regression
34

Ordinal logit & probit model
• In both the cases -
• when the outcome is more than 2 and are ordinal in nature
• The dependent variables :
• Eg1: Likert type scale : strongly agree , somewhat agree, strongly
disagree
• Eg2: less than high school (0), high school(1), college (2), post
graduate (3)
• The independent variables remain same as in logit and probit
model
35

Multi nominal logit and multi
nominal probit
• When the dependent variable is not ordinal nature &
the categories of dependent variables are more than 2.
• E.g. 1: adoption of different adaptation strategies
Dependent variables =choice of transportation to work
Eg2: occupation classification : unskilled, semiskilled, highly
skilled
36

Multi nominal logit model
Kassie et al. 200837
Dependent variable : compost , conservation tillage, both
we have three categories i.e. > 2 categories

Tobit model
• An extension of probit model.
• Developed by James Tobin (Nobel laurate
economist)
• Used when a sample in which information on the
regressand is available only for some observation.
• Such sampled are called as censored sample.
• Therefore Tobit model is also know as censored
regression model.
• Sometimes also called as limited dependent variable
regression models
38

Conti..
• Example:
• Suppose we have a set of consumer and we are
interested in finding out the amount of money a
person or family spends on a house in relation to
socioeconomic variables.
• Here we have a dilemma …
• If a consumer does not purchase a house, obviously
we have no data on housing expenditure for such
consumers, we have such data only for the
consumers; who actually purchase a house.
39

• Thus, consumers are divided into two groups consisting
of say n1 and n2
• n1-about whom we have information on the regressor
(say income, no.of people, mortagage interest rate ) as
well as regressand (amount of expenditure on house )
• n2- about whom we have information only on the
regressor but not on the regressand.
• Now questions arise ?
40

• Can we estimates regression using only n1
observation and not worry about the remaining n2
observation.
• The answer is no..
• For the OLS estimates of the parameters obtained
from the subset of n1 observation will be biased as
well as incosistent.
41

• Statistically we can express tobit model as
•
• Yi= β1+ β 2Xi+Ui if RHS>0
• = 0
• Where RHS=right hand side
• Note : additional X variables can be easily added to
the model.
42

• Truncated sample :
• Distinguish from censored sample.
• In truncated sample information on the regreessor
(IV) is available only if the regressand(DV) is
observed.
43

• If we estimate a regression line based on the n1
observation only, the resulting intercept and slope
coefficients are bound to be different than if all the
(n1+n2) observation were taken into account .
44

Mechanics of estimating tobit
model:
• Tobit model are estimated by method of maximum
likelihood .
• James Hackman has proposed alternative to ML
which is comparatively easy.
• The Heckman procedure yields consistent estimates
of the parameters but they are not as efficient as the
ML estimates.
45

Nested regression analysis
• A nested model is one in which you incrementally
add variables such that every subsequent model is a
superset of the preceding one.
• For example, if y = a + bx is the first model, then
the second model would be something like y = a +
bx + cz +....
• The advantage of this set-up is that it allows you to
compare different specifications and ultimately
investigate the relative importance of specific
variables.
46

• Note that a model is nested if and only if the next
model contains the exact same terms in the
preceding one and has at least one additional term.
• On the other hand, a two-stage model is one in
which two equations are estimated one after the
other with the second stage equation including a
predicted value (usually the predicted outcome or
residuals) from the first stage equation
47

Conclusion
• Clear on – About the assumption of different
regression analysis model.
• Researcher should be well aware of the different model
and used according to the defined research problem.
• Logit and probit model are being extensively used in
health science, behavioral and social sciences.
• Models are extensively used in social research when
dependent variable is dichotomous.
48

References :
• Meyers,L.S ., Gamst , G., & Guarino , A.J
(2006).Applied Multivariate Research : Design And
Interpretation
• Padaria et.al (2009). A Logit Analysis Of Bt Cotton
Adoption And Assessment of Farmer’s Training Need.
Indian Res.J.Ext.Edu.9(2)
• Damodar et al . (2012). Basic econometrics. Mcgraw
Hill Education , India
49

Logit and Probit and Tobit model: Basic Introduction

More Related Content

What's hot

Similar to Logit and Probit and Tobit model: Basic Introduction

Recently uploaded

Logit and Probit and Tobit model: Basic Introduction