Isotonic Regression is a statistical technique of fitting a free-form line to a sequence of observations such that the fitted line is non-decreasing (or non-increasing) everywhere, and lies as close to the observations as possible. Isotonic Regression is limited to predicting numeric output so the dependent variable must be numeric in nature…
4. Terminologies
• Predictors and Target variable:
• Target variable usually denoted by Y, is the variable being predicted and is also
called dependent variable, output variable, response variable or outcome
variable
• Predictor, usually denoted by X, sometimes called an independent or
explanatory variable, is a variable that is being used to predict the
target variable
• Correlation:
• Correlation is a statistical measure that indicates the extent to which two
variables fluctuate together
5. Terminologies (Continued...)
• Isotonic Constraints/Thresholds:
• These are the data points where we can estimate a set a linear model in
between each of the thresholds to minimize the error regarding the training
data
6. Terminologies (Continued...)
• Monotonic Constraints:
• These are either increasing or decreasing set of values which are typically
considered to be increasing in case of isotonic regression.
7. ● xi is observed responses and
X=x1, x2,..., xn
• OBJECTIVE:
It is a variant of linear regression and allows
us to build model in piecewise linear manner
i.e., breaking up the problem into few or
many linear segments and perform linear
interpolation of each function.
• BENEFIT:
o Unlike linear regression this model is not
biased and is flexible.
o Helps in multidimensional scaling.
• MODEL:
o Isotonic regression is the technique of
fitting a free-form line to a sequence of
observations such that the fitted line that
lies close to the observations as possible.
● yi is a finite set of real
numbers Y=y1, y2,...,yn
● wi are positive weights
Introduction
8. Example: Isotonic Regression
Let’s conduct the Isotonic Regression Analysis on Admission Regression Data set on Independent Variables: CGPA, LOR, GRE_Score, TOEFL_Score and
Target Variable: Chance_of_Admit as shown below:
Independent
variables (Xi)
Target
Variable (Y)
Chance_of_admit CGPA LOR GRE_Score TOEFL_Score
0.46 8.0 3.0 308 110
0.64 8.18 3.0 312 98
0.72 8.79 2.5 319 110
0.45 7.46 2.5 290 104
0.57 7.46 2.5 311 98
Model is a an
excellent fit when
Adjusted R-Squared
> 0.7
R-Squared 0.782
Adjusted R-Squared 0.781
R-Squared: It shows the goodness of fit of the model. It
lies between 0 to 1 and closer the value to 1, better the
model.
Adjusted R-Squared: The adjusted R-squared is a
modified version of R-squared that has been adjusted
for the number of predictors in the model. It shows
whether adding additional predictors improve a
regression model or not.
9. Select the Target Variable
Chance_of_Admit
CGPA
LOR
GRE_Score
TOEFL_Score
Step
1
Select the Predictor Variable
Chance_of_Admit
CGPA
LOR
GRE_Score
TOEFL_Score
Step
2
More than one
predictors can be
selected
Step 3
isotonic = true
(This indicates the default value that is based
on the target variable, typically denoting the
increasing property of the isotonic regression).
By default, these parameters
should be set with the values
mentioned
Step 4
Display the output window containing following:
o Model Summary
o Line Fit Plot
o Residual Versus Fit Plot
Note:
▪ Categorical predictors should be auto detected and converted to dummy/binary variables before applying regression
▪ Decision on selection of predictors depends on business knowledge and the correlation value between the target variable and predictors
Standard Input/Tuning Parameters & Sample UI
10. Sample Output: 1. Model Summary
● R-Squared: It shows the goodness of fit of the model. It lies
between 0 to 1 and closer the value to 1, better the model.
Root Mean Square Error (RMSE) 0.066
Mean Absolute Error (MAE) 0.048
Mean_Absolute_Percentile_Error (MAPE) 0.0762619
Mean Percentage Error (MPE) -0.0111406
● Adjusted R-Squared: The adjusted R-squared is a modified
version of R-squared that has been adjusted for the number of
predictors in the model. It shows whether adding additional
predictors improve a regression model or not.
R-Squared 0.782
Adjusted R-Squared 0.781
11. Sample Output: 1. Model Summary (Continued)...
● Root Mean Square Error (RMSE): Square root of the average of squared differences between prediction
and actual observation. It is standard deviation of residual error.
● Mean Absolute Error (MAE): Average of the absolute differences between prediction and actual
observation.
● Mean_Absolute_Percentile_Error (MAPE): Mean Absolute Percentage ratio of residual over actual
observations.
● Mean Percentage Error (MPE): Mean Percentage Error conveys if there is more positive errors than
negative errors or vice-versa based upon its sign.
RMSE, MAE, MAPE and MPE are used to identify the variation in terms of errors from predicted to
actual values.
Lower the values, represent a better fit of the regression model.
12. Sample Output: 2. Interpretation
CGPA
LOR
GRE_Score
TOEFL_Score
Influencer's Importance
Influencer’s Importance chart is used to show impact of each predictor on target variable.
Target Variable: Chance_of_Admit
13. Sample Output: 3. Plots
CGPA Predicted Chance_of_Admit
Line Fit Plot Residual versus Fit Plot
Line fit plots are used to check the assumption of
linearity between each Xi & Y
Residual versus fit plot is used to check the
assumption of equal error variances & outliers
The line plot is plotted between Chance_of_Admit against CGPA.
The residual versus Fit plot is plotted between Predicted Chance_of_Admit and Standard Residuals.
14. Interpretation of Important Model Summary Statistics
RMSE R Squared
RMSE R-Squared
R-Squared: Adjusted R-Squared: RMSE:
• R-squared between 0 to 0.7
represents a model not fit well
and assumptions of normality
and linearity should be checked
for better fitment of a model.
• It shows the goodness of fit of
the model. It lies between 0 to 1
and closer this value to 1, better
the model
• The adjusted R-squared is a
modified version of R-squared
that has been adjusted for the
number of predictors in the
model. It shows whether adding
additional predictors improve a
regression model or not
• If the value is > 0.7, the model
shows a better correlation
between the dependent and
independent variables
• The more the variables, the
lesser is the adjusted R-squared
score
• Square root of the average of
squared differences between
prediction and actual
observation. It is standard
deviation of residual error.
• Lower values of RMSE indicate
a better fit. The value ranges 0
to ∞.
15. Interpretation of Important Model Summary Statistics
(Continued…)
RMSE R Squared
RMSE R-Squared
MAE: MAPE: MPE:
• Average of the absolute
differences between prediction
and actual observation
• Lower values of MAE indicate a
better fit. The value ranges 0 to
∞
• Like RMSE, it is a negatively
oriented score
• Mean Absolute Percentage
ratio of residual over actual
observations
• Lower the MAPE, better the
performance of the model
• Mean Percentage Error conveys
if there is more positive errors
than negative errors or vice-
versa based upon its sign
• In case of more negative errors
the system underestimates, and
in case of more positive errors
the system overestimates
16. Interpretation of Plots: Line Fit Plot
• This plot is used to plot the relationship between
each Xi (predictor) & Y (target variable) with Y-on-y
axis and each Xi on x axis
• As shown in the figure 1 in right, as temperature(X)
increases, so does the Yield(Y), hence there is a
linear relationship between X and Y and isotonic
regression is applicable on this data
• If line doesn’t display linearity as shown in figures 2
& 3 in right, then transformation can be applied on
that particular variable before proceeding with
model building
• If data transformation doesn’t help, then either
that variable(Xi) can be dropped from the analysis
or nonlinear model should be chosen depending
on the distribution pattern of scatter plot.
Figure 2
Figure 3
Figure 1 CGPA
Chance_of_Admit
Figure 2
Figure 3
17. Interpretation of Plots: Residual Versus Fit Plot
• It is the scattered plot of standardized residuals on
Y axis and predicted (fitted) values on X axis
• It is used to detect the unequal residual variances
and outliers in data
• Here are the characteristics of a well-behaved
residual vs. fits plot:
• The residuals should "bounce randomly" around
the 0 line and should roughly form a "horizontal
band" around the 0 line as shown in figure 1. This
suggests that the variances of the error terms are
equal
• No one residual should "stands out" from the
basic random pattern of residuals. This suggests
that there are no outliers.
⮚ Plots shown in
figures 2 & 3 above
depict unequal
error variances,
which is not
desirable for linear
regression analysis
Figure 1
Figure 2
Figure 3
18. Limitations
Time independent error
( fairly constant over time & lying within certain range)
Time dependent error (decreasing with time)
• Isotonic Regression is limited to predicting
numeric output i.e., dependent variable must be
numeric in nature
• Minimum sample size should be at least 20 cases
per independent variable
• Significant risk of overfitting for larger number of
isotonic constraints/thresholds.
• Isotonic Regression is monotonic and hence it is
not appropriate for fitting distributions that have
left and right tails.
19. Limitations (Continued…) Figure 1
Figure 2
• It does not fit derivatives, so it will not approximate
smooth curves like most distribution functions. It may be
useful to approximate heuristically the predicted values,
but would not be especially useful for extrapolation
beyond the extreme values of the x-axis data.
• Target/independent variables should be normally
distributed
• A normal distribution is an arrangement of a data set in
which most values cluster in the middle of the range and
the rest taper off symmetrically toward either extreme. It
will look like a bell curve as shown in figure 1 in right
• Outliers(observations lying outside overall pattern of
distribution) in data, both target as well as independent
variables can affect the analysis, hence outliers need to
be removed as shown in Figure 2 in right.
20. Business Use Case 1
Business Problem: Decide Loan Eligibility based on Applicant’s Annual income, Employment Period,
Debt to Income Ratio etc.
Input Data: Predictor/Independent Variable(s) to determine Applicant’s Loan Eligibility:
• House Ownership Status
• Job Grade
• Employment Length
• Annual Income
• Loan Verification Status
• Debt to Income Ratio
Business Benefit: Loan applicant’s can discover what predictors can lead towards the required loan
amount to be eligible for further proceedings in turn ensuring systematic banking approach and
also assist banks to check the loan eligibility criteria before sanctioning a loan to the applicant.
21. Business Use Case 2
Business Problem: Predicting diamond prices using basic measurement metrics.
Input Data: Predictor/Independent Variable(s) to determine the price of a Diamond:
• Carat weight of Diamond
• Quality of the Cut
• Diamond Color
• Clarity
• Depth
• The width of the diamond’s table
Business Benefit:
The predictive model will provide details on the pricing of diamonds and enable analysis of the most
prominent factors and trends in the diamond market.
22. Want to
Learn More?
Get in touch with us @
support@Smarten.com
And Do Checkout the Learning section
on
Smarten.com
September 2021