FBA-PPTs-sssion-17-20 .pptx

Course Title: Fundamentals of
Data Analytics
Course Code: MFDA21205
Instructor: Ms. Sudeshna Sani, Asst. Professor, School of Business.

Session-17-18
Machine Learning Models for Data Analytics
•Classification: Decision Tree
•Regression: Linear regression

Machine Learning
Machine learning is a subfield of artificial intelligence (AI) that
focuses on developing algorithms and models that enable
computers to learn and make decisions or predictions without
being explicitly programmed. In other words, machine learning
allows systems to automatically learn and improve from
experience without human intervention.
This is achieved through the use of various techniques and algorithms
that enable machines to learn patterns, relationships, and rules from
data.

Supervised Learning Algorithms
• Linear Regression
• Logistic Regression
• Decision Trees
• Random Forests
• Support Vector Machines (SVM)
• Naive Bayes
• K-Nearest Neighbors (KNN)
• Neural Networks
• Gradient Boosting Algorithms

Types of Unsupervised Learning
• Clustering Algorithms
• K-Means Clustering
• Hierarchical Clustering
• DBSCAN (Density-Based Spatial Clustering of Applications with Noise)
• Dimensionality Reduction Algorithms
• Principal Component Analysis (PCA)
• t-SNE (t-Distributed Stochastic Neighbor Embedding)
• Association Rule Mining

Regression: Linear regression
Linear Regression is a machine learning
algorithm based on supervised regression
algorithm. Regression models a target
prediction value based on independent
variables. It is mostly used for finding out
the relationship between variables and
forecasting.
Linear regression is used to estimate the
dependent variable in case of a change in
independent variables. For example,
predict the price of houses.
The weight of the person is linearly
related to their height. So, this shows a
linear relationship between the height and
weight of the person. According to this, as
we increase the height, the weight of the
person will also increase.

Session-19-Performance Evaluation of Models
• Confusion Matrix
• Accuracy
• Precision
• Recall
• F1-Score
• Residual Errors

Confusion Matrix
A confusion matrix is a tabular
summary of the number of correct
and incorrect predictions made by
a classifier. It is used to measure the
performance of a classification
model.

Let’s take an example:
We have a total of 20 cats and dogs and our model predicts
whether it is a cat or not.
Actual values = [‘dog’, ‘cat’, ‘dog’, ‘cat’, ‘dog’, ‘
dog’, ‘cat’, ‘dog’, ‘cat’, ‘dog’,
‘dog’, ‘dog’, ‘dog’, ‘cat’, ‘dog’,
‘dog’, ‘cat’, ‘dog’, ‘dog’, ‘cat’]
Predicted values = [‘dog’, ‘dog’, ‘dog’, ‘cat’, ‘dog’,
‘dog’, ‘cat’, ‘cat’, ‘cat’, ‘cat’,
‘dog’, ‘dog’, ‘dog’, ‘cat’, ‘dog’,
‘dog’, ‘cat’, ‘dog’, ‘dog’, ‘cat’]

Sl. No Actual value(P) Predicted value(N) Match TP/TN
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
‘dog’,
‘cat’,
‘dog’,
‘cat’,
‘dog’,
‘dog’,
‘cat’,
‘dog’,
‘cat’,
‘dog’,
‘dog’,
‘dog’,
‘dog’,
‘cat’,
‘dog’,
‘dog’,
‘cat’,
‘dog’,
‘dog’,
‘cat’
‘dog’,
‘dog’,
‘dog’,
‘cat’,
‘dog’,
‘dog’,
‘cat’,
‘cat’,
‘cat’,
‘cat’,
‘dog’,
‘dog’,
‘dog’,
‘cat’,
‘dog’,
‘dog’,
‘cat’,
‘dog’,
‘dog’,
‘cat’
Yes
No
Yes
Yes
Yes
Yes
Yes
No
Yes
No
Yes
Yes
Yes
No
Yes
Yes
Yes
Yes
Yes
Yes
TP
FP
True Positive (TP) = 6
True Negative (TN) = 11
False Positive (Type 1 Error) (FP) = 2
False Negative (Type 2 Error) (FN) = 1

Precision:
“Precision is a useful
metric in cases
where False Positive is a
higher concern than False
Negatives”.
In Spam Detection : Need
to focus on precision

Recall:
It is a measure of actual
observations which are
predicted correctly, i.e. how many
observations of positive class are
actually predicted as positive. It is also
known as Sensitivity. Recall is a
valid choice of evaluation metric when
we want to capture as many
positives as possible.

F-measure / F1-Score
The F1 score is a number between 0 and 1 and is
the harmonic mean of precision and recall. We use
harmonic mean because it is not sensitive to extremely large
values, unlike simple averages.
F1 score sort of maintains a balance between the precision
and recall for your classifier. If your precision is low, the F1
is low and if the recall is low again your F1 score is low.

Residuals
• A residual is a measure of how far away a point is vertically
from the regression line. Simply, it is the error between a
predicted value and the observed actual value.

Session – 20 –
Project/Case Study for Data Analytics
• Mini-Project /Case Study

FBA-PPTs-sssion-17-20 .pptx

Recommended

Recommended

More Related Content

Similar to FBA-PPTs-sssion-17-20 .pptx

Similar to FBA-PPTs-sssion-17-20 .pptx (20)

More from Rishabh332761

More from Rishabh332761 (18)

Recently uploaded

Recently uploaded (20)

FBA-PPTs-sssion-17-20 .pptx