Predictive Models
Predictive Modeling
 A Statistical technique used to predict the outcome of
future events based on historical data
 Involves building a mathematical model that takes
relevant input variables, and generates a predicted
output variable
 ML algorithms are used to train and improve these
models to make better decisions

Used in many industries and applications

Fraud detection

Customer segmentation

Disease diagnosis and

Stock price prediction
2
3
Descriptive
Analytics
Predictive
Analytics
Prescriptive
Analytics
Class characterization : An example
4
Predictive Modeling Vs Predictive Analytics
 Predictive Modeling is a subset of Predictive analytics,
specifically refers to the modeling process of the overall
process
 Predictive Analytics – encompasses the entire process of
using data, statistical algorithms, and ML Techniques to
make predictions about future events or outcomes.
 PA includes, data preparation and cleansing, data
integration and exploration, developing and deploying
models and collaborating and sharing the findings
5
Common Predictive Modeling
Techniques/Algorithms
 Regression Models: Used for predicting continuous values or probabilities.

Linear Regression

Logistic Regression

Polynomial Regression
 Classification Models: Used for predicting categorical outcomes.

Decision Trees

Random Forests

Support Vector Machines (SVMs)

Naive Bayes

K-Nearest Neighbors (KNN)
 Neural Networks: Used for complex pattern recognition, particularly in image and
sequential data.

Multilayer Perceptrons (MLPs)

Convolutional Neural Networks (CNNs)

Recurrent Neural Networks (RNNs)
 Ensemble Methods: Combining multiple models for improved performance.

Bagging (e.g., Random Forest)

Boosting (e.g., Gradient Boosting Machines, XGBoost)
6
Regression Models
 (Numerical) Prediction is the use of model to predict
continuous or ordered value for a given input
 Major method for prediction: Regression
 model the relationship between one or more
independent or predictor variables and
 a dependent or response variable
 Regression analysis
 Linear and multiple regression
 Non-linear regression
 Other regression methods: generalized linear model,
Poisson regression, log-linear models, regression trees
7
Linear Regression
 Linear regression: involves a response variable
y and a single predictor variable x
y = w1 x + w0
where w1 (slope) and w0 (y-intercept) are regression
coefficients
 Method of least squares: estimates the best-
fitting straight line







 |
|
1
2
|
|
1
)
(
)
)(
(
1 D
i
i
D
i
i
i
x
x
y
y
x
x
w x
w
y
w
1
0


8
Example
 Consider the example below where the mass, y (grams), of a
chemical is related to the time, x (seconds), for which the chemical
reaction has been taking place according to the table:
 Find the equation of the regression line.
 y = 12.208 x + 11.506
 Find the mass of the chemical when Time(x)=14s
9
Time X
(seconds)
Mass Y
(Grams)
5 40
7 120
12 180
16 210
20 240
W1=12.208
W0=11.506
Salary Data
x years experience y salary (in 1000)
3 30
8 57
9 64
13 72
3 36
6 43
11 59
21 90
1 20
16 83
Problem : Given the
Salary data, Predict the
salary of a college
graduate with 10 years of
experience ?
10
11
Mid Term
(x)
Final Exam
(y)
72 84
50 63
81 77
74 78
94 90
86 75
59 49
83 79
65 77
33 52
88 74
81 90
Predict the final exam grade
of a student who received an
78 on the mid term
examination.
Exercise
|D| = 12;
A(x) = 866/12 = 72.167; A(y) = 888/12 = 74.
w1 = 0.5816 and w0 = 32.028.
The student’s final exam mark is
y = 32.028 + 0.5816x.
The final exam mark of a student who received an 86
on the mid term examination = 77.39
12
Key Assumptions
Conditions that should be met for linear regression to
provide reliable results:
•Linearity: There must be a linear relationship between the
independent and dependent variables
•Independence of Errors: The errors (residuals) of the
predictions should be independent of each other; one error
should not influence another.
•Homoscedasticity: The variance of the errors should be
constant across all levels of the independent variable.
•Normality of Errors: The errors (residuals) should follow a
normal (bell-shaped) distribution.
•No Multicollinearity (for multiple regression): In models with
more than one independent variable, these independent
variables should not be highly correlated with each other.
13
Multiple Linear Regression
14
Sales Price Prediction
 Given,
 Predict the Sale_Price for
15
Sales_Price = w0 + w1*size + w2 * Bedrooms
Size(Sq.ft)
X1
Bedroom
X2
Sale_Price
Y
3800 5 3,20,00,000
2200 4 1,20,00,000
1150 2 96,00,000
2000 3 1,00,00,000
Size Bedroom
3200 4
Multiple linear regression
• Involves more than one predictor variable
– Training data is of the form (X1, y1), (X2, y2),…, (X|D|, y|D|)
– Ex. For 2-D data, we may have: y = β0 + β1 x1+ β2 x2
16
Example
 Suppose we have the following dataset with
one response variable y and two predictor
variables X1 and X2:
 Find the relationship between predictor and
response variables.
17
18
Regression Sum
19
Fit a multiple linear regression for the given dataset
20
β1=
β2=
β0=
Exercise
21
Multiple Regression Using Matrices
22
[β]=[XT
X]-1
[XT
Y]
 Nonlinear models - modeled by a polynomial function
 A polynomial regression model can be transformed into
linear regression model.

For ex.: y = w0 + w1 x + w2 x2
+ w3 x3
 Convertible to linear with new variables: x2 = x2
, x3= x3
y = w0 + w1 x + w2 x2 + w3 x3
 Other functions, such as power function, can also be
transformed to linear model
 Some models are intractable nonlinear (e.g., sum of
exponential terms)
 possible to obtain least square estimates through
extensive calculation on more complex formulae
Non-linear Regression
23

It is a Machine learning based ppt, In this all are mentioned.

  • 1.
  • 2.
    Predictive Modeling  AStatistical technique used to predict the outcome of future events based on historical data  Involves building a mathematical model that takes relevant input variables, and generates a predicted output variable  ML algorithms are used to train and improve these models to make better decisions  Used in many industries and applications  Fraud detection  Customer segmentation  Disease diagnosis and  Stock price prediction 2
  • 3.
  • 4.
  • 5.
    Predictive Modeling VsPredictive Analytics  Predictive Modeling is a subset of Predictive analytics, specifically refers to the modeling process of the overall process  Predictive Analytics – encompasses the entire process of using data, statistical algorithms, and ML Techniques to make predictions about future events or outcomes.  PA includes, data preparation and cleansing, data integration and exploration, developing and deploying models and collaborating and sharing the findings 5
  • 6.
    Common Predictive Modeling Techniques/Algorithms Regression Models: Used for predicting continuous values or probabilities.  Linear Regression  Logistic Regression  Polynomial Regression  Classification Models: Used for predicting categorical outcomes.  Decision Trees  Random Forests  Support Vector Machines (SVMs)  Naive Bayes  K-Nearest Neighbors (KNN)  Neural Networks: Used for complex pattern recognition, particularly in image and sequential data.  Multilayer Perceptrons (MLPs)  Convolutional Neural Networks (CNNs)  Recurrent Neural Networks (RNNs)  Ensemble Methods: Combining multiple models for improved performance.  Bagging (e.g., Random Forest)  Boosting (e.g., Gradient Boosting Machines, XGBoost) 6
  • 7.
    Regression Models  (Numerical)Prediction is the use of model to predict continuous or ordered value for a given input  Major method for prediction: Regression  model the relationship between one or more independent or predictor variables and  a dependent or response variable  Regression analysis  Linear and multiple regression  Non-linear regression  Other regression methods: generalized linear model, Poisson regression, log-linear models, regression trees 7
  • 8.
    Linear Regression  Linearregression: involves a response variable y and a single predictor variable x y = w1 x + w0 where w1 (slope) and w0 (y-intercept) are regression coefficients  Method of least squares: estimates the best- fitting straight line         | | 1 2 | | 1 ) ( ) )( ( 1 D i i D i i i x x y y x x w x w y w 1 0   8
  • 9.
    Example  Consider theexample below where the mass, y (grams), of a chemical is related to the time, x (seconds), for which the chemical reaction has been taking place according to the table:  Find the equation of the regression line.  y = 12.208 x + 11.506  Find the mass of the chemical when Time(x)=14s 9 Time X (seconds) Mass Y (Grams) 5 40 7 120 12 180 16 210 20 240 W1=12.208 W0=11.506
  • 10.
    Salary Data x yearsexperience y salary (in 1000) 3 30 8 57 9 64 13 72 3 36 6 43 11 59 21 90 1 20 16 83 Problem : Given the Salary data, Predict the salary of a college graduate with 10 years of experience ? 10
  • 11.
    11 Mid Term (x) Final Exam (y) 7284 50 63 81 77 74 78 94 90 86 75 59 49 83 79 65 77 33 52 88 74 81 90 Predict the final exam grade of a student who received an 78 on the mid term examination. Exercise
  • 12.
    |D| = 12; A(x)= 866/12 = 72.167; A(y) = 888/12 = 74. w1 = 0.5816 and w0 = 32.028. The student’s final exam mark is y = 32.028 + 0.5816x. The final exam mark of a student who received an 86 on the mid term examination = 77.39 12
  • 13.
    Key Assumptions Conditions thatshould be met for linear regression to provide reliable results: •Linearity: There must be a linear relationship between the independent and dependent variables •Independence of Errors: The errors (residuals) of the predictions should be independent of each other; one error should not influence another. •Homoscedasticity: The variance of the errors should be constant across all levels of the independent variable. •Normality of Errors: The errors (residuals) should follow a normal (bell-shaped) distribution. •No Multicollinearity (for multiple regression): In models with more than one independent variable, these independent variables should not be highly correlated with each other. 13
  • 14.
  • 15.
    Sales Price Prediction Given,  Predict the Sale_Price for 15 Sales_Price = w0 + w1*size + w2 * Bedrooms Size(Sq.ft) X1 Bedroom X2 Sale_Price Y 3800 5 3,20,00,000 2200 4 1,20,00,000 1150 2 96,00,000 2000 3 1,00,00,000 Size Bedroom 3200 4
  • 16.
    Multiple linear regression •Involves more than one predictor variable – Training data is of the form (X1, y1), (X2, y2),…, (X|D|, y|D|) – Ex. For 2-D data, we may have: y = β0 + β1 x1+ β2 x2 16
  • 17.
    Example  Suppose wehave the following dataset with one response variable y and two predictor variables X1 and X2:  Find the relationship between predictor and response variables. 17
  • 18.
  • 19.
  • 20.
    Fit a multiplelinear regression for the given dataset 20 β1= β2= β0=
  • 21.
  • 22.
    Multiple Regression UsingMatrices 22 [β]=[XT X]-1 [XT Y]
  • 23.
     Nonlinear models- modeled by a polynomial function  A polynomial regression model can be transformed into linear regression model.  For ex.: y = w0 + w1 x + w2 x2 + w3 x3  Convertible to linear with new variables: x2 = x2 , x3= x3 y = w0 + w1 x + w2 x2 + w3 x3  Other functions, such as power function, can also be transformed to linear model  Some models are intractable nonlinear (e.g., sum of exponential terms)  possible to obtain least square estimates through extensive calculation on more complex formulae Non-linear Regression 23