ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
How to use Logistic Regression in GIS using ArcGIS and R statistics
1. Logistic Regression in GIS using R environment
Omar F. Althuwaynee, PhD
Geomatics Engineering
2. You have to go through the following videos regarding
data preparation, software's, and more info about R:
1. Prediction (susceptibility mapping) In GIS (Part1)Using Modified Frequency Ratio
2. Using R as GIS tools: here is my own learning experience (Week 1 -Part 1)
3. Using R as GIS tools: here is my own learning experience (Week 1 -Part 2)
Course preparations
Omar F. Althuwaynee, PhD in Geomatics engineering
3. 1. Evaluate and compare the results of applying the
multivariate logistic regression method, to
Produce susceptibility map, using GIS and R
environment.
Course objectives
Omar F. Althuwaynee, PhD in Geomatics engineering
4. 1. Create dichotomous (0,1) training and testing data
2. Effectively set your project environment , and install
packages according to R.
3. Prepare spatial data in R environment .
4. Learn basic operations with spatial data in R.
End of these sessions, you will be able to
Omar F. Althuwaynee, PhD in Geomatics engineering
5. 5. Develop mapping cartographical skills in R,
like,Resampling, clipping of Raster Data.
6. Run Statistical analysis, using binary Logistic
regression.
7. Run statistical tests and produce output reports.
8. Run accuracy and validation tests using AUC of ROC.
9. Producing and export resultant maps.
End of these sessions, you will be able to
Omar F. Althuwaynee, PhD in Geomatics engineering
6. • LR is a method for fitting a regression curve, y = f(x)
• LR is part of a larger class of algorithms known as
Generalized Linear Model (glm).
• Dependent factor y
– Binomial logistic regression, when dependent factor (y), has 2
values (value either 0, or 1). like; landslide, pollutants, (0= not
exist, 1= exist).
– Multinomial logistic regression, If dependent variable has more
than 2 values. (Like classifying fruits between; “Ripe”, “Over-
ripe” or “Under-ripe”.
• Independent factors x
– Set of predictors x (Slope, elevation, land use.).
– The predictors (x) can be continuous, categorical or a mix of
both.
Logistic regression (LR)
Omar F. Althuwaynee, PhD in Geomatics engineering
7. • To predict the probability, whether a landslide will
occur(y) in a particular places, or not.
Data:
• Independent factor Y (Landslide training data locations) 75
observations.
• Dependent factors X (Elevation, slope, NDVI, Curvature)
Current Application
Omar F. Althuwaynee, PhD in Geomatics engineering
8. Model predicts the probability of occurrence by fitting data to a logit function
g(y) = βo + β1(Elevation)+ β2(slope)+ β3(NDVI)+ β4(Curvature) (a)
Where
g(y):link function,
Now g() donate with ‘p’ initially
• probability must always be positive(never be negative), so the linear equation
will be in exponential form. For any value of slope and dependent variable.
p = exp(βo + β(Elevation)+…..) = e^(βo + β(Elevation)+…) (b)
• To make the probability less than 1, we must divide p by a number greater
than p.
p = exp(βo + β(Elevation)+…..) / exp(βo + β(Elevation)+…..) + 1 =
e^(βo + β(Elevation)+…..) / e^(βo + β(Elevation)+…..) + 1 (c)
Equations
Omar F. Althuwaynee, PhD in Geomatics engineering
9. Using (a), (b) and (c), we can redefine the probability as:
p = e^y/ 1 + e^y (d)
P= 1/1+e^(-y)
where
p : is the probability of success.
(d) equation is the Logit Function
A typical logistic model plot is shown below. You can see probability never
goes below 0 and above 1.
Equations
Omar F. Althuwaynee, PhD in Geomatics engineering
10. It is nothing but a tabular representation of Actual vs Predicted
values. This helps us to find the accuracy of the model and avoid
overfitting.
Resource:
https://www.analyticsvidhya.com/blog/2015/11/beginners-guide-on-logistic-
regression-in-r/
Omar F. Althuwaynee, PhD in Geomatics engineering
Confusion Matrix
11. Note:
• Analysis will depend only on the number of the observations,
more training observations will increase the model efficiency.
• LR may produce lower prediction rate, but this value will have
higher confidence level (low uncertainty compare to bivariate)
More details , I will talk about it along each video session.
Without further ado…let us begin..!
Logistic regression
Omar F. Althuwaynee, PhD in Geomatics engineering