Forecasting Using the Predictive Analytics

Forecasting Using Predictive
Analytics
P. Ramanjaneya Prasad
Associate Professor,
Department of Computer Science
Avanthi Degree & PG College

Forecasting
• Forecasting is the process of making predictions based on past and
present data. Basically, it is a decision-making tool that helps
businesses cope with the impact of the future's uncertainty by
examining historical data and trends.
• Short Term Forecasting
• Long Term Forecasting

Benefits of Forecasting
• Better utilization of resources
• Formulating business plans
• Enhance the quality of management
• Helps in establishing new business model.
• Helps in making the best managerial decisions

Predictive Analytics
• Predictive analytics is the process of using data to forecast future
outcomes. The process uses data analysis, machine learning,
artificial intelligence, and statistical models to find patterns that
might predict future behavior.

Process of Predictive Analytics
• Defining the project
• Collecting the data
• Analyzing the data
• Deploying the statistics

Regression
• A regression is a statistical technique that relates a dependent
variable to one or more independent (explanatory) variables. A
regression model is able to show whether changes observed in
the dependent variable are associated with changes in one or
more of the explanatory variables.

Bivariate Regression
• Bivariate Regression Analysis involves analysing two variables to
establish the strength of the relationship between them. The two
variables are frequently denoted as X and Y, with one being an
independent variable (or explanatory variable), while the other is a
dependent variable (or outcome variable).
• It is used to figure out any discrepancies are there between two
variables and the causes for the differences.
• The data comprises of two measurements such as X & Y which can
be interpreted as (X,Y). These variables are called bivariate simple
random sample (SRS)

Bivariate Regression
Observations X-Variable Y-Variable
1 10 5
2 5 4
3 6 3
4 8 2
5 4 3
The SRS can be written as (10,5), (5,4), (6,3), (8,2), (4,5)

Types of Bivariate analysis
• It depends upon type of variables that are used for analyzing. The
variable could be nominal, ordinal and numerical.
• Nominal
• Ordinal
• Numerical and Numerical
• Categorical and categorical
• Numerical and categorical

Bivariate Analysis
• The choice of variable must be properly made (Independent and
dependent)
Calorie Intake Weight (pound)
3500 250
2000 225
1500 110
2250 145
4500 380
• There is Y value for each value of X. For example calorie intake of
3000 is given for weight of 250 pounds

Methods of Bivariate analysis
• Scatter Plots
• Regression Analysis
• Correlation coefficients

Simple Linear Regression Model
• Simple Liner regression analysis is used to predict the value of a
variable based on the value of another variable. The variable to
be predicted is called dependent variable and the variable that is
used to predict is called independent variable.
• Y=b0 + b1X+ e
• Y is dependent and X is independent

Multiple Linear Regression Model
• Multiple Linear regression analysis is used the value of one variable is
dependent on the value of many variables. That is, one dependent variable
and many independent variables.
• Eg: Yield of crop (Water, Soil, Fertilizer)
• These dependent variables can be controlled. But the yield may also
depend on uncontrollable variables such as those associated with weather.
• A linear model relating the response of y to several predictors is
Y=b0 + b1X1+ b2X2 + …………+bkXk + e
• Y is dependent and X1, X2, ….Xk is independent
• b0, b1, b2, bk are called regression coefficients.

Multiple Linear Regression Model
• A linear model relating the response of y to several unknown
predictors may not be linear and hence represented as
Y=b0 + b1X1+ b2X1
2
+ b3Sin X2 + …………+bkXk + e

Application of Regression Models in Data
Analysis
• Forecasting – Future Opportunities and Threats.
• CAPM – Capital Asset Pricing Model-link b/w assets projected
return and related market risk premium-based on Linear
regression.
• Comparing With Competition – Comparing companies financial
performance to that of its counterpart – relationship b/w two
firms
• Identifying Problems – Detecting Judgement mistakes
• Reliable Source – Reduce Guess Work

Calculate Linear Bivariate Regression in order
to model Commonly Encountered Problems.
Linear – Fitting a Straight line through the given points
Bivariate – Two Variables
The expression of linear regression is y = b0 + b1x + e
e is an error term and is included to make both sides equal.
b0 & b1 are the regression coefficients need to be estimated.
b0 is the constant in the model also called intercept.
b1 is the slope of the model and is represented as change in ‘y’ by a change of 1 unit in ‘x’

For the above data we have to develop scatter plot
Consider the Following example of a company which spends on advertising to improve sales
Region Spent on Advertising (x) Sales (y)
1 6 10
2 9 12
3 8 12
4 3 4
5 10 12
6 4 6

Select the data ‘x, & ‘y’
Steps to develop scatter plot
Select insert tab and go to charts and select scatter plot
Fitting a linear regression or Least Square Estimation
Select any one point on the plot and right click to display properties
Select Add Trend Line
Check the boxes if needed to display equation on chart and R2
Value

The fitted value is the predicted value of the dependent variable
2 3 4 5 6 7 8 9 10 11
0
2
4
6
8
10
12
14
Sales (y)
Residuals are the difference between the actual and fitted values of the dependent variable

Residual = Observed Value – Fitted Value
If the point is above the line, It is positive Residual and below the line, it is negative residual.
If the point lies on the line, Residual is zero
The best fitting line for given data is the line with the smallest sum of squared residuals called least
square line. Thus the observed line is y = a + bx which is the equation of straight line.

Therefore slope
A = Y - bX
e = Y – YA
The errors can be summarized by a single numerical value called Standard Error that is the standard
deviation of all errors

Determine the Quality of Fit of a Linear Model
R2
– Goodness of fit measure for linear regression models
The statistic that indicates the percentage of variance in the dependent variable that the independent
variables explain collectively is R2
R2
Measures the strength of the relationship between your model and the dependent variable on a
convenient scale of 0-100%
One should determine how well the model fits the data after fitting a linear regression model.
2 3 4 5 6 7 8 9 10 11
0
2
4
6
8
10
12
14
Sales (y)

𝑅 2=
𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑒𝑥𝑝𝑙𝑎𝑖𝑛𝑒𝑑 𝑏𝑦 h
𝑡 𝑒 𝑚𝑜𝑑𝑒𝑙
𝑇𝑜𝑡𝑎𝑙 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒
Higher R2
value represent smaller differences between the observed data and the fitted values
R2
Measures the strength of the relationship between your model and the dependent variable on a
convenient scale of 0-100%
R2 evaluates the scatter of data points around the fitted regression line and is also called as coefficient
of determination or coefficient of multiple determination for multiple regression

R2 value of 0% represents a model that does not explain any of the variation in the response variable
around its mean.
R2 value of 100% represents a model that explains all the variation in the response variable around its
mean.

Establishing its Usefulness for Prediction
Predictive Analytics gives accurate and reliable insights, helping organisations to solve problems and
identify opportunities such as
Detecting Fraud
Identifies the patterns to detect and prevent criminal behavior-Identify abnormalities-Safegaurd the
data
Reducing Risk
PA used in Finance sectors – Accurate picture of the customer for decision making. Eg: Credit Scores
Optimising Marketing Campaigns
Customer Response & Purchases
Improving Decision Making
More data sw has – better the deccision

Establishing its Usefulness for Prediction
Improving Efficiency
Forecast inventory and manage resources – optimize performance and increase revenue

Multivariate Regression Analysis
Multivariate Regression analysis is a supervised machine learning algorithm involving multiple data
variables for analysis
It has one dependent variable and many independent variables.
Based on the number of variables, the output is predicted.
The simple regression is linear meaning straight line represents a function y of x.
The multiple regression is planar that represents extra dimensions
Y= bo + b1.x1 + b2.x2+ ……….. + bn.xn

Assumptions
Independent Variables are not much correlated to each other – No Multicollinearity
Relationship between Dependent and Independent Variables – Must be Linear
Residual Variance is constant – Residual Errors must be distributed equally
Independence of observation – Observations and Residuals must be independent – Watson Statistic (0
to 4)
Multivariate Normality – normal residual distribution

Calculate Linear Multivariate Regression in order to model Commonly
Encountered Problems.
A multivariate regression model relates y-variable to ‘k’ x-variables of population can be written as
‘k’ represents the number of independent variables. This implies there are k+1 regression coefficients.
is the error having normal distribution with mean 0 and constant variance
The subscript ‘i’ represents ith
individual.
The subscript following ‘i’ in the ‘x’ variable denotes which variable of ‘x’ it is.
Therefore multivariate regression model for a sample can be written by replacing β with b
Notation for the Population Model

The estimates of β coefficients are the values that minimize the sum of squared errors (SSE)
Mean squared error (MSE) measures the amount of error in statistical models. It assesses the average
squared difference between the observed and predicted values. A zero MSE value represents there are
no errors.
Estimates of the model parameters
As the data points fall closer to the regression line, the model has less error, decreasing the MSE. A
model with less error produces more precise predictions.

The formula for MSE is given by
Where is the ith
observed value
is the Corresponding predicted value
n is the number of observation
Estimates of the model parameters
MSE can be found by taking the observed value, subtract the predicted value, and square that
difference. Repeat that for all observations. Then, sum all of those squared values and divide by the
number of observations.
The formula for MSE is given by

T represents the mean response when all other independent variables are ‘0’
T represents the mean response per unit increase in x1 when remaining variables are held constant.
Interpretation of Model Computers
As similar to simple linear regression the curves are fitted and residual analysis are done.

Establishing its usefulness for Prediction
Uses data very efficiently and can make useful predictions
Allows statistician to explore the effect of more than one variable on the outcome
It gives optimal solutions
Multiple regression offers a degree of flexibility.

Identify Logistic regression models and apply those models to
common business problems
Logistic regression is a data analysis technique that uses mathematics to find the relationships between
two data factors.
It then uses this relationship to predict the value of one of those factors based on the other.
This regression was first used in the biological sciences and later was used in the social sciences
application.
Assumptions
The dependent variable must be categorical in nature
The independent variables should not have multi-collinearity.

Identify Logistic regression models and apply those models to
common business problems
The logistic regression equation can be obtained from the linear regression equation
y= b0 + b1.x1 + b2.x2+ ……….. + bn.xn
In logistic regression ‘y’ can be between two values 0 and 1. So, for this let’s divide the above equation
by 1-y i.e,
As we need the range between , we take log for the above equation
Therefore the logistic regression can be written as

Types of Logstic Regression
Binomial - Two types of dependent variable |(0 or 1; Yes or No; Pass or Fail)
Binomial - Three or more possible unordered types of dependent variable (cat, dogs or sheeps)
Ordinal: Three or more possible ordered types of dependent variable (Small, medium or Large)
Assumptions
Logistic Regression is very easy to understand
It Requires less Training
It performs well for simple data sets and data sets is linearly seperable
It is less likely to be overfitted.

Forecasting in Time
Time series analysis is a way of analyzing a sequence of data points collected over an interval of time
In particular, a time series allows one to see what factors influence certain variables from period to
period.
For Example, The rise and fall of the temperature over the course of a day.
Data collected irregularly or only once are not time series.
Helps to predict the future behavior of variable based on past experiences
Helpful in business planning as its in comparing the actual current performance with the expected one
Change in the values of different variables at times or places etc.

Identify the components of time forecast in order to predict future
values from a model
Trend – Long Period of time
Increase, Decrease or stable
Eg: Population, Agricultural Production, Number of Births and Deaths
Seasonal - Rhythmic Variations
Regular, Periodic and fixed variations
Less than one year
Daily, Weekly, Quarterly, Half yearly
Eg: Sales in seasons
Cyclic – More than one year
one Period is cycle
Not Periodic
Random or Irregular – Purely Random in nature
Uncontrollable, unpredictable are erratic
Natural Disasters

Differentiate Seasonal Variation from trends in order to improve
prediction of future values from a model
Graphical – simple, No mathematical calculation, Can be used if trend is not linear
Not subjective Method, values of trend is different for different analyst.
Semi Averages – Simple and easy to understand, does not require many calculations
Used only for linear trend, extreme values are effected
Moving Averages– It can be easily applied, Useful in periodic variations, Same value for
any analyst
Non-Periodic data is less effective, Period selection is very important,
value for first and last few years cannot be found
Method of Least Squares– Eliminates personal bias, trend values for all the given time
period is obtained, Enables to forecast future values perfectly.
Calculations are difficult, Addition of new observations require
recalculations, ignores cyclical, seasonal and irregular variations,
Ideal for near future only

Differentiate Seasonal Variation from trends in order to improve
prediction of future values from a model
Simple Average Methods
Ratio to Trend Method
Percentage moving Average Method
Link Relative Method

Calculate seasonal indices so that seasonal variations can be
qualified in the model
Simple Average Methods
Percentage moving Average Method
Link Relative Method

Simple Average Method
The time series data for each of the 4 seasons of a particular year are expressed as
percentages to the seasonal average for that year
The percentages for different seasons are averaged over the years by using simple
average and its required seasonal indices are determined
Method for calculating seasonal indices
The data is arranged season wise
The data for all the seasons are added first for all the years and the seasonal averages
for each year is computed
The average of seasonal averages is calculated

Simple Average Method
The seasonal average for each year is divided by the corresponding grand average and
the results are expressed in percentages and these are called seasonal indices
𝐺𝑟𝑎𝑛𝑑 𝐴𝑣𝑒𝑟𝑎𝑔𝑒=
𝑇𝑜𝑡𝑎𝑙 𝑜𝑓 𝑆𝑒𝑎𝑠𝑜𝑛𝑎𝑙 𝐴𝑣𝑒𝑟𝑎𝑔𝑒𝑠
4
𝑆𝑒𝑎𝑠𝑜𝑛𝑎𝑙 𝐼𝑛𝑑𝑒𝑥 =
𝑆𝑒𝑎𝑠𝑜𝑛𝑎𝑙 𝐴𝑣𝑒𝑟𝑎𝑔𝑒
𝐺𝑟𝑎𝑛𝑑 𝐴𝑣𝑒𝑟𝑎𝑔𝑒
𝑋 100

This method is used when the cyclical variations are absent from the data, that is , the
time series variable Y consists of trend, seasonal and Random components
𝑌 = 𝑇 . 𝑆 . 𝑅
Steps to calculate seasonal variations
Obtain the trend values for each month or quarter using method of least squares
Divide the original values by the corresponding trend values. This would eliminate
trend values from the data
To get figures in percentages, the quotients are multiplied by 100

Percentage Moving Average Method
It is most commonly used method of measuring seasonal variations.
This method has all 4 components of time series
Steps to calculate seasonal variations
Compute the moving averages with period equal to the period of seasonal variations.
This will eliminate seasonal effect and minimize random component.
The original values for each quarter (or month) are divided by the respective moving
average figures and the ratio is expressed as percentage
To get figures in percentages, the quotients are multiplied by 100

Link Relatives Method
This method of finding the seasonal indices in the form of the chain relatives
The link relatives are percentages of the current period as compared with previous period
Steps to Link Relatives Method
Find the link relatives of all the seasonal data by the formula, LR1 = (m1/m0) x 100
Arrange all the link relatives thus obtained season wise and find the seasonal average of
such link relatives either by Mean or by Median
Convert each of the averages of the link relatives into chain relatives by the formula,

Link Relatives Method
Find the revised chain relative of the foremost season on the basis of the chain relative of
the last season by the formula
Determine the constants correcting the chain relatives by d =
Deduction of the correction factor d, after being multiplied by 1,2, 3 (and so an)
respectively from the chain relatives of the 2nd, 3rd, 4th and so on seasons, and thereby
find the preliminary indices of the seasonal variations.

Measuring Distance
Clustering consists of grouping certain objects that are similar to each other, it can be used
to decide if two items are similar or dissimilar in their properties.
In a Data Mining sense, the similarity measure is a distance with dimensions describing
object features. That means if the distance among two data points is small then there is
a high degree of similarity among the objects and vice versa.
The number of input variables or datasets is called dimensionality

Identify the meaning of dimensionality and its implications in
analytics
A dimension is a structure that categorises facts and measures in order to enable users to
answer business questions
The number of input variables or datasets is called dimensionality
Missing Variable
Low Variance
Decision Trees
Random Forest
High Correlation
Backward feature elimination

Calculate different types of distances and identify
scenarios when each type is applicable
The different types of distances are
1. Minkowski Distance
2. Euclidean Distance
3. Manhattan Distance
4. Cosine Distance
5. Jaccard distance
6. Hamming Distance

Minkowski Distance
Minkowski distance is a distance measured between two points in
N-dimensional space.
It is widely used in the field of Machine learning, especially in the
classification of data.
Minkowski distance is used in certain algorithms like K-Nearest
Neighbors, K-Means Clustering etc.
Satisfying Conditions
- Non-Negativity: d(x,y)>=0
- Identity: d(x,y) = 0 if and only if x=y

Minkowski Distance
Symmetry: d(x,y) = d(y,x)
Triangle Inequality: d(x,y) + d(y,z) = d(x,z)
Let the points be
P1: (x1,x2,………xn) & P2: (y1,y2,……yn)
The formula for minkowski is given by

Euclidean Distance
Euclidean distance is considered the traditional metric for problems
with geometry.
It is one of the most used algorithms in the cluster analysis. One of
the algorithms that use this formula would be K-mean

Euclidean Distance
The Euclidean Distance can be calculated by
substituting p=2 in the Minkowski distance formula

Manhattan Distance
This is also known as taxicab distance or city block distance.
The distance between the two points is the sum of the absolute
difference between their coordinates
Nearest-neighbor methods have the advantage that they are easy
to implement.

Manhattan distance Distance
The Euclidean Distance can be calculated by
substituting p=2 in the Minkowski distance formula
+

Jaccard Distance
The Jaccard distance measures the similarity of the two data set
items.
The Jaccard distance is given by subtracting Jaccard Index from
unity value. The Jaccard index is the ratio of intersection to union of
the sets.

Cosine Distance
The Cosine distance the measure of dissimilarity between two data
set items.
It is given by.

Hamming Distance
Hamming distance is a metric for comparing two binary data
strings.
It is the number of bit positions in which the two bits are different
It gives the result of how many attributes are different.
It is used in Error Detection and Correction techniques in data
communications
Eg: ABCDE and AGDDF

Unit - 2
A subset of artificial intelligence known as machine learning
focuses primarily on the creation of algorithms that enable a
computer to independently learn from data and previous
experiences.

Classification of Machine Learning
• Supervised learning
• Unsupervised learning
• Reinforcement learning

Supervised Learning
• Supervised learning is the types of machine learning in which
machines are trained using well "labelled" training data, and on
basis of that data, machines predict the output. The labelled data
means some input data is already tagged with the correct
output.
• The Supervised Machine Learning algorithm can be broadly
classified into Regression and Classification Algorithms. In
Regression algorithms, we have predicted the output for
continuous values, but to predict the categorical values, we need
Classification algorithms.

Classification
• The Classification algorithm is a Supervised Learning technique that is used
to identify the category of new observations on the basis of training data.
In Classification, a program learns from the given dataset or observations
and then classifies new observation into a number of classes or groups.
Such as, Yes or No, 0 or 1, Spam or Not Spam, cat or dog, etc. Classes can
be called as targets/labels or categories.
• Binary Classifier: If the classification problem has only two possible
outcomes, then it is called as Binary Classifier.
Examples: YES or NO, MALE or FEMALE, SPAM or NOT SPAM, CAT or DOG,
etc.
• Multi-class Classifier: If a classification problem has more than two
outcomes, then it is called as Multi-class Classifier.
Example: Classifications of types of crops, Classification of types of music.

KNN Algorithm
• K-Nearest Neighbour is one of the simplest Machine Learning
algorithms based on Supervised Learning technique.
• K-NN algorithm assumes the similarity between the new
case/data and available cases and put the new case into the
category that is most similar to the available categories.
• K-NN algorithm stores all the available data and classifies a new
data point based on the similarity. This means when new data
appears then it can be easily classified into a well suite category
by using K- NN algorithm.
• K-NN algorithm can be used for Regression as well as for
Classification but mostly it is used for the Classification problems.

KNN Algorithm
• K-NN is a non-parametric algorithm, which means it does not
make any assumption on underlying data.
• It is also called a lazy learner algorithm because it does not learn
from the training set immediately instead it stores the dataset
and at the time of classification, it performs an action on the
dataset.
• KNN algorithm at the training phase just stores the dataset and
when it gets new data, then it classifies that data into a category
that is much similar to the new data.

How does K-NN work?
• Step-1: Select the number K of the neighbors
• Step-2: Calculate the Euclidean distance of K number of neighbors
• Step-3: Take the K nearest neighbors as per the calculated
Euclidean distance.
• Step-4: Among these k neighbors, count the number of the data
points in each category.
• Step-5: Assign the new data points to that category for which the
number of the neighbor is maximum.
• Step-6: Our model is ready.

Advantages of KNN Algorithm:
• It is simple to implement.
• It is robust to the noisy training data
• It can be more effective if the training data is large.

Disadvantages of KNN Algorithm:
• Always needs to determine the value of K which may be complex
some time.
• The computation cost is high because of calculating the distance
between the data points for all the training samples.

Calculate KNN algorithm with fixed or variable number of
k and assess the quality of result
KNN can be used for both classification and regression problems.
But its wide use is in classification. To evaluate any problem, we
generally look at 3 important aspects:
From the above table, KNN classifier fairs across all parameters of
consideration. It is commonly used for its ease of interpretation and
low calculation time.

The “K” in KNN algorithm is the nearest neighbor we wish to take
the vote from. The choice of the parameter K is very crucial in this
algorithm.
In the example the value of k=3,
Likewise the analysis is performed
for different values of k.

K-Value Vs Training Error
1. At K=1 the error rate is
always zero for the training
sample. This is because
the closest point to any
training data point is itself.
2. As K value increases the
error rate also increase
and hence high value of k
is not recommended.

K-Value Vs Validation Error
1. At K=1 the error rate is
always High & then falls to
minimum value.
Thereafter it starts to
increase again.
2. The training and validation
errors are segregated and
then optimal value of k is
determined..

Support Vector Machine (SVM)
Support Vector Machine (SVM) is a powerful machine learning
algorithm used for linear or nonlinear classification, regression,
and even outlier detection tasks.
SVMs can be used for a variety of tasks, such as text classification,
image classification, spam detection, handwriting identification,
gene expression analysis, face detection, and anomaly detection.
SVMs are adaptable and efficient in a variety of applications
because they can manage high-dimensional data and nonlinear
relationships.

Identify the basics of the SVM classification algorithm
(or) Basics of SVM algorithm
The main goal of SVM is to separate the given data into best
possible way. The basic terminologies of SVM are as follows
Hyperplane: Hyperplane is the decision
boundary that is used to separate the data
points of different classes.
Support Vectors: Support vectors are the
closest data points to the hyperplane.
Margin: Margin is the distance between the
support vector and hyperplane. The main
objective of the support vector machine
algorithm is to maximize the margin

Identify the basics of the SVM classification algorithm
(or) Basics of SVM algorithm
• Kernel: Kernel is the mathematical function, which is used
in SVM to transform data points from non-Linearity to
linearity.
Linear Kernal
Polynomial Kernal
Radial Kernal
• Hard Margin: The maximum-margin hyperplane or the
hard margin hyperplane is a hyperplane that properly
separates the data points of different categories without
any misclassifications.
• Soft Margin: When the data is not perfectly separable or
contains outliers, SVM permits a soft margin technique.

Identify what a support vector is and its use in the
classification algorithm (or) Uses of SVM
The main idea of SVM is to find the optimal hyperplane (Line in 2D,
Plane in 3D & Hyperplane in more than 3D).
Support Vectors are equidistant from the hyperplane. They are
called support vectors because if their position shifts, the
hyperplane shifts as well. This means that the hyperplane depends
only on the support vectors and not on any other observations.
Application or uses
Face Detection: SVM classify the parts of image as face and non-
face and create a boundary around the face.

Identify what a support vector is and its use in the
classification algorithm (or) Uses of SVM
Text & Hypertext Categorization: SVM allow text and hypertext categorization
for both inductive and transductive models.
Classification of Image: SVM provides better search accuracy for image
classification.
Bio-Informatics: It includes protein classification and cancer classification. We
use SVM for identifying the classification of genes, patients on the basis of
genes and other biological problems.
Hand writing Recognition: SVM is used to detect the handwriting patterns

Calculate SVM to solve common business problems

Calculate SVM to solve common business problems
Accuracy =
Classification Error = 100 - Accuracy

Identify the steps to build a decision tree classifier
- It is a supervised Machine learning Algorithm
- Used for both regression and classification
- Builds decision nodes at each step
- Based on tree based model.

Root Node: It is the topmost node in the
tree, which represents the complete
dataset. It is the starting point of the
decision-making process.
Decision Node: A node that symbolizes a
choice regarding an input feature. It is
derived from parent node getting divided.
Leaf Node: A node without any child
nodes
Root Node
Internal Node (or)
Child Node (or)
Decision Node
Internal Node (or)
Child Node (or)
Decision Node
Leaf Node
(or)
Terminal Node
Leaf Node
(or)
Terminal Node
Yes No
Yes No

Age < 30
Eat Pizza
Yes
Exercise
No
Unfit
Yes
Fit
No
Unfit
Yes
Fit
No

Step 1: A tree can be “learned” by splitting the source set into
subsets based on Attribute Selection Measures (ASM).
Step 2: This process is repeated on each derived subset in a
recursive manner called recursive partitioning
Step 3: The recursion is completed when the subset at a node all
has the same value of the target variable,

Advantages
1. Can be used for both classification and Regression
2. Easy to interpret
3. No need for Normalization or scaling
4. Not sensitive to outliers
Disadvantages
5. Overfitting Issue
6. Small changes in the data alter the tree structure causing
instability.
7. Training time is relatively high

Use a decision tree
algorithm and appropriate metrics to solve a business problem and assess the quality of the solution
Entropy: It is the quantitative measure of randomness of
information being processed.
A high value of entropy refers to high
randomness and prediction of accuracy
is very tough.
A low entropy value makes an accurate
prediction

Use a decision tree
Information Gain: It is the measure of how much information a
feature provide about a class.
It gives the difference between entropy before split and average
entropy after split.
A high value of entropy refers to low information gain
A low entropy value refers to high information gain.

Use a decision tree
Gini Impurity: It is the measure of impurity at node.
The split made in the decision tree is said to be pure if all the data
points are accurately separated into different classes.

Unit – 3 : Clustering
Unsupervised Learning
Unsupervised learning is a type of machine learning that learns
from unlabeled data. This means that the data does not have any
pre-existing labels or categories. The goal of unsupervised learning
is to discover patterns and relationships in the data without any
explicit guidance. Here the task of the machine is to group
unsorted information according to similarities, patterns, and
differences without any prior training of data.

Clustering
Clustering
Clustering or cluster analysis is a machine learning technique,
which groups the unlabelled dataset. It can be defined as "A way
of grouping the data points into different clusters, consisting of
similar data points. The objects with the possible similarities
remain in a group that has less or no similarities with another
group.“
shape, size, color, behavior
cluster-ID

Clustering
Advantages
1. Pattern Discovery:
2. Data Summarization
3. Anomaly Detection
4. Customer Segmentation

Clustering
Applications
1. In Identification of Cancer Cells
2. In Search Engines
3. In Biology
4. Customer Segmentation
5. In Land Use

Determine core aspects and types of clustering in order
to properly apply the algorithms to business problems.
Clustering Methods
1. Partitioning Method – CBM, K, Center
2. Density-Based Method – High density, Arbitrary distributions,
connects clusters HD
3. Model-Based Method – Gaussian distribution, Grouped
3. Hierarchical Method – No prespecifying K, tree-dendogram
Agglomerative & divisive approach

Apply various clustering algorithms to data sets in order
to solve common applicable business problems.
K-Means clustering algorithm
Centroid based algorithm
K – Predefined clusters
Tasks
Best value of k
Data point closest to k-center forms a cluster

Working of K-Means clustering algorithm
• Step-1: Select the number K to decide the number of clusters.
• Step-2: Select random K points or centroids. (It can be other from the input
dataset).
• Step-3: Assign each data point to their closest centroid, which will form the
predefined K clusters.
• Step-4: Calculate the variance and place a new centroid of each cluster.
• Step-5: Repeat the third steps, which means reassign each datapoint to the
new closest centroid of each cluster.
• Step-6: If any reassignment occurs, then go to step-4 else go to FINISH.
• Step-7: The model is ready

Choosing the optimum value of k
Within Cluster Sum of Squares(WCSS), which defines the total variations
within a cluster.
WCSS= Σdistance(Pi C1)2
+Σdistance(Pi C2)2
+Σdistance(Pi C3)2
• For each value of K, calculates the WCSS value.
• Plots a curve between calculated WCSS values and the number of clusters K.
• The sharp point of bend or a point of the plot looks like an arm, then that
point is considered as the best value of K.

Hierarchical Clustering Algorithm
A Hierarchical clustering method works via grouping data into a
tree of clusters.
Hierarchical clustering begins by treating every data point as a
separate cluster.
It iteratively combines the closest clusters until a stopping criterion
is reached.
The result of hierarchical clustering is a tree-like structure, called a
dendrogram

Types
Agglomerative Clustering
Divisive Clustering

Agglomerative Clustering Algorithm
1. Consider every data point as an individual cluster
2. Calculate the similarity of one cluster with all the other clusters
(calculate proximity matrix)
3. Merge the clusters which are highly similar or close to each
other.
4. Recalculate the proximity matrix for each cluster
5. Repeat Steps 3 and 4 until only a single cluster remains.

Divisive Clustering Algorithm - Divisive Hierarchical clustering is
precisely the opposite of Agglomerative Hierarchical clustering.
1. Consider all data points as Single cluster
2. At every iteration we separate the data points which are not
comparable.
3. In the end we are left with N clusters.

Advantages
• The ability to handle clusters of different sizes and densities.
• The ability to handle missing data and noisy data.
• The ability to reveal the hierarchical structure of the data, which
can be useful for understanding the relationships among the
clusters.

Disadvantages
• The need for a criterion to stop the clustering process and
determine the final number of clusters.
• The computational cost is high.
• The Memory requirements of the method can be high

Optimization
Optimization:
The act of obtaining the best result under the given circumstances.
Design, construction and maintenance of engineering systems
involve decision making both at the managerial and the
technological level
Goals of such decisions :–– to minimize the effort required or to
maximize the desired benefit

Optimization
Optimization:
It is defined as the process of finding the conditions that give the
minimum or maximum value of a function, where the function
represents the effort required or the desired benefit.

Optimization
Components of Optimization
1. Objective function f(x)– Maximise or Minimise, Conversion
possible
2. Decision Variables (x) – Can be controlled – minf(x) or maxf(x)
3. Constraints – Limits the function to certain set of values.

Optimization
Application of Optimization
1. Design of structural units in construction, machinery, and in space vehicles.
2. Maximizing benefit/minimizing product costs in various manufacturing and
construction processes.
3. Optimal path finding in road networks.
4. Optimal production planning, controlling and scheduling.
5. Optimal Allocation of resources or services among several activities to
maximize the benefit.

Optimization
Types of Optimization
1. Convex Optimization – All constraints are convex-Objective
function is convex if minimizing – Linear and Non-linear
2. Non – Convex Optimization - All constraints and objective function
are Non-convex – Continuous & Discrete
3. Optimization under uncertainty – data is uncertain – Robust

Identify the goals and constraints of a Linear
Optimization
Linear optimization is a method applicable for the solution of
problems in which the objective function and the constraints
appear as linear functions of the decision variables.
The constraint equations may be in the form of equalities or
inequalities.
It is useful in modeling the problems in planning, routing,
scheduling, assignment and design

Optimization
Characteristics
• The objective function is of the minimization or maximization
type
• All the constraints are of the equality type
• All the decision variables are non-negative

Optimization
Steps of Linear Optimization
• Identify the objective function & Decision Variables.
• Identify the constraints.
• Write down the optimization model.
• Either solve graphically and/or manually else solve using R.
• Conduct sensitivity analysis.
• Interpret results and make recommendations

Optimization
Goals of Linear Optimization.
• To optimize an objective function.
• Decision variables (XX) are used to either maximize or minimize
the objective function. Examples are maximize profits, minimize
costs or time spent, minimize risks etc.
• The optimization will be w.r.t. real world quantity.

Optimization
Constraints of Linear Optimization.
• Decision variables are physical quantities under the control of
decision maker represented by Mathematical symbols. For example
X1 represents the number of kgs of product 1 produced in some
month.
• Objective function is the mathematical function of decision variables
that converts a solution into numerical evaluation.
• Constraints are functional equalities or inequalities that represents
economic, technological, numerical restrictions etc. on the decision
variables.

Calculation of Linear Optimization to solve some
common business problem
The linear optimization on business problems can be implemented
by using any of the two methods.
1. Simplex Method
2. Graphical Method

Simplex Method
The simplex method was developed during the Second World War and
uses an approach that is very efficient.
Step 1: Write the objective function and the inequality constraints.
Step 2: Convert the inequalities into equations by adding one slack
variable for each inequality.
Step 3: Construct the initial simplex tableau. Each inequality constraint
appears in its own row. Write the objective function as the bottom row.

STEP 5. Calculate the quotients. The smallest quotient identifies a row.
The element in the intersection of the column identified in step 4 and
the row identified in this step is identified as the pivot element.
STEP 6. Perform pivoting to make all other entries in this column zero.
STEP 7. When there are no more negative entries in the bottom row, we
are finished; otherwise, we start again from step 4.
STEP 8. Read off your answers.

Maximize Subject
to: Z=40x1+30x
2x1+x2 12
≤
2x1+x2 16
≤
x1 0;x2 0
≥ ≥
Objective function Subject to
constraints:
−40x1 30
− x2+Z=0
x1+x2+y1=122
x1+x2+y2=16
x1 0;
≥ x2 0
≥

Simulation
Data simulation is the process of generating synthetic data that
closely mimics the properties and characteristics of real-world data.
It is used to predict the future instance or determine the best
course of action or validate a model.
It is used to test new ideas before making a business decision.
It has the advantage of data not needing to be collected from
surveys

Features of Simulation
Flexibility:
Since the data is manufactured, it can be adjusted to simulate a wide range of scenarios and
conditions without any constraint, allowing a system to be studied in more depth. This is
particularly useful when testing out large-scale simulation models and predictive models.
Scalability:
In addition to data quality, data volume plays a critical role in training machine learning and
artificial intelligence models. The scalability of simulated data elevates its value for such use
cases—since the data is artificial, it can be generated as needed to reflect the randomness and
complexity of real-world systems.
Replicability:
Similar circumstances and conditions can be reproduced in a different simulated dataset to
ensure consistency in testing. This consistency is crucial for validating models and hypotheses,
as it allows you to test them repeatedly and refine them based on the results.

Advantages or Benefits or Usecase of Simulation
Enhanced Decision Making:
Data simulation can inform decision-making by simulating various conditions or events and predicting
outcomes based on actions.
Cost Efficiency:
The harvested data is more cost-effective, as it reduces the need for physical testing and active data
collection. Hence the cost is reduced.
Improved Model Validity
Data simulation can aid in model testing and refinement. Creating a virtual representation of a real-
world system makes it possible to test different models and refine them based on the results, leading
to more accurate models that are better at predicting scenarios in great detail.
Risk Reduction
Data simulation can provide data on crises and potential issues, allowing organizations to identify
pitfalls or challenges before they occur in the real world. This can avoid risks in the long term run.

Role of simulation in predictive analytics
Manufacturing – To optimize production processes, improve maintenance
scheduling, plan inventory, and more.
Healthcare – In clinical trials, pharmaceutical market analysis, and more.
Oil & Gas – for operations planning, field production optimization, storage
management, and more.
Business Processes – for optimization, investment analysis, impact analysis, and
more.
Supply Chain – for design, planning, sourcing optimization, inventory
management, transportation planning, risk management — see anyLogistix.

Types of simulation
Monte Carlo simulations. It uses random sampling to obtain results
for uncertain situations and is widely used in finance, physics, and
engineering to model complex systems and predict behavior.
Agent-based modeling. This type of simulation focuses on the actions
and interactions of individual & autonomous agents within the data
systems.
System dynamics: System dynamics helps to understand non-linear
feedback loops in more complex systems and is often used in
economics, environmental science etc

Types of simulation
• Discrete-event simulations: These models focus on individual
events in the system and how they affect the outcome, and are
widely used in operations research, computer science, and
logistics to simulate processes and systems.

Monte Carlo Simulation
• Monte Carlo simulation is a technique in which random numbers are
substituted into a statistical model in order to forecast the future values.
Step 1: Define the Problem
Define – Identify Objectives – Factors impact on objectives
Step 2: Construct an appropriate model
Variables & Parameters – Appropriate decision rules
Type of distribution – relation b/w variables & parameters

Monte Carlo Simulation
• Step 3: Model for Experimentation
Initial condtn – Number of runs
Step 4: Experiment with the model
Factor correlate – select random number – Associate random number
with factor
Step 5: Summarize and examine the result
Step 6: Evaluate the results after simulation.

Naïve Bayes algorithm
• Bayes theorem, named after Thomas Bayes from the 1700s. The
Naive Bayes classifier works on the principle of conditional
probability, as given by the Bayes theorem.
• Consider the following example of tossing two coins. If we toss
two coins and look at all the different possibilities, we have the
sample space as:{HH, HT, TH, TT}

• The probability of getting two heads = 1/4
• The probability of at least one tail = 3/4
• The probability of the second coin being head given the first coin
is tail = 1/2
• The probability of getting two heads given the first coin is a head
= ½

• The Bayes theorem gives us the conditional probability of event
A, given that event B has occurred. In this case, the first coin toss
will be B and the second coin toss A.
• Now in this sample space, let A be the event that the second coin
is head, and B be the event that the first coin is tails
• We're going to focus on A, and we write that out as a probability
of A given B:
P(A|B) = [ P(B|A) * P(A) ] / P(B)

• = [ P(First coin being tail given the second coin is the head) *
P(Second coin being head) ] / P(First coin being tail)
• = [ (1/2) * (1/2) ] / (1/2)
• = 1/2 = 0.5

• Face Recognition
• Weather Prediction
• Medical Diagnosis
• News Classification

Forecasting Using the Predictive Analytics

More Related Content

Similar to Forecasting Using the Predictive Analytics

More from PRPrasad1

Recently uploaded

Forecasting Using the Predictive Analytics