SlideShare a Scribd company logo
Machine Learning
Presented By
Deepak Sharma
1
Topics Covered
▶ K Means
▶ Linear Regression
▶ Logistic Regression
▶ Decision Tree
▶ Random Forest
2
Machine Learning
Machine learning is an application of artificial intelligence
(AI) that provides systems the ability to automatically learn
and improve from experience without being explicitly
programmed.
Unsupervised learning is a type of machine learning
algorithm used to draw inferences from datasets consisting
of input data without labelled responses.
Supervised learning is inferring from labelled training data.
Unsupervised and Supervised Learning
3
4
K Means
▶ k-means is one of the simplest unsupervised learning algorithms that solve the
well known clustering problem.
▶ The algorithm classify a given data set through a certain number of clusters
(assume k clusters) fixed apriori.
▶ Algorithmic steps for k-means clustering
▶ Let X = {x1,x2,x3,……..,xn} be the set of data points and V = {v1,v2,…….,vc} be the set
of centers.
▶ 1) Randomly select ‘c’ cluster centers.
▶ 2) Calculate the eucledian distance between each data point and cluster centers.
▶ 3) Assign the data point to the cluster center whose distance from the cluster center is
minimum of all the cluster centers..
▶ 4) Recalculate the new cluster center
▶ 5) Recalculate the distance between each data point and new obtained cluster centers.
▶ 6) If no data point was reassigned then stop, otherwise repeat from step 3).
5
Optimal Value of K - Elbow Method
6
Advantage of K Means
▶ 1) Fast, robust and easier to understand and implement.
▶ 2) Gives best result when data set are distinct or well separated from each other.
▶ 1) The learning algorithm requires apriori specification of the number of cluster
centers.
▶ 2) The use of Exclusive Assignment - If there are two highly overlapping data then
k-means will not be able to resolve that there are two clusters.
▶ 3) Euclidean distance measures can unequally weight underlying factors.
▶ 4) The learning algorithm provides the local optima of the squared error function.
Disadvantage of K Means
7
Business Application
1) Document Classification - Cluster documents in multiple categories based on tags, topics,
and the content of the document.
2) Customer Segmentation - Clustering helps marketers improve their customer base, work
on target areas, and segment customers based on purchase history, interests, or activity
monitoring
3) Insurance Fraud Detection - Utilizing past historical data on fraudulent claims, it is
possible to isolate new claims based on its proximity to clusters that indicate fraudulent
patterns.
4) Cyber Profile Criminals -The idea of cyber profiling is derived from criminal profiles,
which provide information on the investigation division to classify the types of criminals
who were at the crime scene.
8
Linear Regression
▶ Linear regression is the simplest and most widely used statistical technique for
predictive modeling.
▶ It basically gives us an equation, where we have our features as independent
variables, on which our target variable is dependent upon.
▶ Linear regression equation looks like
▶ Here, we have Y as our dependent variable (Sales), X’s are the independent
variables and all thetas are the coefficients.
▶ Coefficients are basically the weights assigned to the features, based on their
importance.
▶ Residuals - The difference between the values predicted by us and the observed
values
9
Linear Regression
10
Explained & Unexplained Variation
11
Key Assumptions of Regression
The regression has five key assumptions:
▶ Linear relationship
▶ Multivariate normality
▶ No or little multicollinearity
▶ No auto-correlation
▶ Homoscedasticity
12
Goodness of Fit
▶ R Square - This statistic indicates the percentage of the
variance in the dependent variable that the independent
variables explain collectively.
▶ R-squared measures the strength of the relationship
between your model and the dependent variable on a
convenient 0 – 100% scale.
▶ Adjusted R Square - The adjusted R-squared is a modified
version of R-squared that has been adjusted for the
number of predictors in the model.
▶ The adjusted R-squared increases only if the new term
improves the model more than would be expected by
chance.
▶ It decreases when a predictor improves the model by less
than expected by chance.
13
Business Application
▶ A factory manager can create a statistical model to understand the impact of
oven temperature on the working of instrument
▶ A call centre can optimize customer handling operation by understanding the
impact of number of customers on wait time
▶ A hotel can forecast the number of staff required as per the demand fluctuate
14
Logistic Regression
▶ It’s a classification algorithm
▶ it predicts the probability of occurrence of an event by fitting data to a logit
function.
▶ Mathematically, logistic regression estimates a multiple linear regression
function defined as:
▶ this is interpreted as taking input log-odds and having output probability.
15
Sigmoid Function
16
Model Validation
▶ Confusion Matrix
17
ROC -Receiver Operating Characteristics
▶ The ROC curve shows the true positive rate (i.e., recall) against the false
positive rate.
▶ The False Positive Rate is the ratio of negative instances that are incorrectly
classified as positive. It is equal to one minus the true negative rate.
▶ Specificity - The True Negative rate is called specificity
▶ The ROC curve plots sensitivity (recall) versus 1-specificity
18
Decision Tree
▶ Decision tree is a type of supervised learning algorithm (having a pre-defined target variable) that
is mostly used in classification problems. It works for both categorical and continuous input and
output variables.
▶ Categorical Variable Decision Tree: Decision Tree which has categorical target variable then it
called as categorical variable decision tree. Example:- In above scenario of student problem, where
the target variable was “Student will play cricket or not” i.e. YES or NO.
▶ Continuous Variable Decision Tree: Decision Tree has continuous target variable then it is called as
Continuous Variable Decision Tree.
19
Terminology Used in Decision Tree
20
How does a tree
decide where to
split?
▶ Gini Index - if we select two items
from a population at random then
they must be of same class and
probability for this is 1 if population is
pure.
▶ It works with categorical target
variable “Success” or “Failure”.
▶ It performs only Binary splits
▶ Higher the value of Gini higher the
homogeneity.
▶ CART (Classification and
Regression Tree) uses Gini method
to create binary splits.
21
22
Information Gain
▶ How much information is required to explain the disorganisation in the system
▶ Here p and q is probability of success and failure respectively in that node.
Entropy
23
Advantage of Decision Tree
▶ Easy to Understand
▶ Useful in Data exploration
▶ Less data cleaning required
▶ Over fitting
▶ Not fit for continuous variables
Disadvantage of Decision Tree
24
Bias-Variance Tradeoff
25
Ensemble
▶ Ensemble methods is a machine learning technique that combines several base
models in order to produce one optimal predictive model
▶ Bagging - Here idea is to create several subsets of data from training sample
chosen randomly with replacement.
▶ Now, each collection of subset data is used to train their decision trees. As a
result, we end up with an ensemble of different models. Average of all the
predictions from different trees are used
▶ Boosting - In this technique, learners are learned sequentially with early learners
fitting simple models to the data and then analyzing data for errors.
Types of Ensemble Method
26
Random Forest
▶ Random Forest is an extension over bagging.
▶ It takes one extra step where in addition to taking the random subset of data, it
also takes the random selection of features rather than using all features to grow
trees.
▶ Handles higher dimensionality data very well.
▶ Handles missing values and maintains accuracy for missing data.
▶ Since final prediction is based on the mean predictions from subset trees, it won’t
give precise values for the regression model.
Advantages of Random Forest
Disadvantages of Random Forest
27
Thank You
28

More Related Content

Similar to Machine Learning Approach.pptx

Ad Click Prediction - Paper review
Ad Click Prediction - Paper reviewAd Click Prediction - Paper review
Ad Click Prediction - Paper review
Mazen Aly
 
Ds for finance day 3
Ds for finance day 3Ds for finance day 3
Ds for finance day 3
QuantUniversity
 
CUSTOMER CHURN PREDICTION
CUSTOMER CHURN PREDICTIONCUSTOMER CHURN PREDICTION
CUSTOMER CHURN PREDICTION
IRJET Journal
 
A detailed analysis of the supervised machine Learning Algorithms
A detailed analysis of the supervised machine Learning AlgorithmsA detailed analysis of the supervised machine Learning Algorithms
A detailed analysis of the supervised machine Learning Algorithms
NIET Journal of Engineering & Technology (NIETJET)
 
UNIT-II-Machine-Learning.pptx Machine Learning Different AI Models
UNIT-II-Machine-Learning.pptx Machine Learning Different AI ModelsUNIT-II-Machine-Learning.pptx Machine Learning Different AI Models
UNIT-II-Machine-Learning.pptx Machine Learning Different AI Models
JVSTHARUNSAI
 
IRJET- A Detailed Study on Classification Techniques for Data Mining
IRJET- A Detailed Study on Classification Techniques for Data MiningIRJET- A Detailed Study on Classification Techniques for Data Mining
IRJET- A Detailed Study on Classification Techniques for Data Mining
IRJET Journal
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
Saad Elbeleidy
 
MIS637_Final_Project_Rahul_Bhatia
MIS637_Final_Project_Rahul_BhatiaMIS637_Final_Project_Rahul_Bhatia
MIS637_Final_Project_Rahul_Bhatia
Rahul Bhatia
 
Second subjective assignment
Second  subjective assignmentSecond  subjective assignment
Second subjective assignment
yatheeshabodumalla
 
Machine learning - session 3
Machine learning - session 3Machine learning - session 3
Machine learning - session 3
Luis Borbon
 
Machine_Learning_Trushita
Machine_Learning_TrushitaMachine_Learning_Trushita
Machine_Learning_Trushita
Trushita Redij
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learning
Akshay Kanchan
 
Optimal Model Complexity (1).pptx
Optimal Model Complexity (1).pptxOptimal Model Complexity (1).pptx
Optimal Model Complexity (1).pptx
MurindanyiSudi1
 
Identifying and classifying unknown Network Disruption
Identifying and classifying unknown Network DisruptionIdentifying and classifying unknown Network Disruption
Identifying and classifying unknown Network Disruption
jagan477830
 
CASE STUDY: ADMISSION PREDICTION IN ENGINEERING AND TECHNOLOGY COLLEGES
CASE STUDY: ADMISSION PREDICTION IN ENGINEERING AND TECHNOLOGY COLLEGESCASE STUDY: ADMISSION PREDICTION IN ENGINEERING AND TECHNOLOGY COLLEGES
CASE STUDY: ADMISSION PREDICTION IN ENGINEERING AND TECHNOLOGY COLLEGES
IRJET Journal
 
Data Analytics Using R - Report
Data Analytics Using R - ReportData Analytics Using R - Report
Data Analytics Using R - Report
Akanksha Gohil
 
Decision trees
Decision treesDecision trees
Decision trees
Ncib Lotfi
 
Top 20 Data Science Interview Questions and Answers in 2023.pptx
Top 20 Data Science Interview Questions and Answers in 2023.pptxTop 20 Data Science Interview Questions and Answers in 2023.pptx
Top 20 Data Science Interview Questions and Answers in 2023.pptx
AnanthReddy38
 
Intrusion Tolerance for Networked Systems through Two-level Feedback Control
Intrusion Tolerance for Networked Systems through Two-level Feedback ControlIntrusion Tolerance for Networked Systems through Two-level Feedback Control
Intrusion Tolerance for Networked Systems through Two-level Feedback Control
Kim Hammar
 
Presentation1.pptx
Presentation1.pptxPresentation1.pptx
Presentation1.pptx
narmeen11
 

Similar to Machine Learning Approach.pptx (20)

Ad Click Prediction - Paper review
Ad Click Prediction - Paper reviewAd Click Prediction - Paper review
Ad Click Prediction - Paper review
 
Ds for finance day 3
Ds for finance day 3Ds for finance day 3
Ds for finance day 3
 
CUSTOMER CHURN PREDICTION
CUSTOMER CHURN PREDICTIONCUSTOMER CHURN PREDICTION
CUSTOMER CHURN PREDICTION
 
A detailed analysis of the supervised machine Learning Algorithms
A detailed analysis of the supervised machine Learning AlgorithmsA detailed analysis of the supervised machine Learning Algorithms
A detailed analysis of the supervised machine Learning Algorithms
 
UNIT-II-Machine-Learning.pptx Machine Learning Different AI Models
UNIT-II-Machine-Learning.pptx Machine Learning Different AI ModelsUNIT-II-Machine-Learning.pptx Machine Learning Different AI Models
UNIT-II-Machine-Learning.pptx Machine Learning Different AI Models
 
IRJET- A Detailed Study on Classification Techniques for Data Mining
IRJET- A Detailed Study on Classification Techniques for Data MiningIRJET- A Detailed Study on Classification Techniques for Data Mining
IRJET- A Detailed Study on Classification Techniques for Data Mining
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
 
MIS637_Final_Project_Rahul_Bhatia
MIS637_Final_Project_Rahul_BhatiaMIS637_Final_Project_Rahul_Bhatia
MIS637_Final_Project_Rahul_Bhatia
 
Second subjective assignment
Second  subjective assignmentSecond  subjective assignment
Second subjective assignment
 
Machine learning - session 3
Machine learning - session 3Machine learning - session 3
Machine learning - session 3
 
Machine_Learning_Trushita
Machine_Learning_TrushitaMachine_Learning_Trushita
Machine_Learning_Trushita
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learning
 
Optimal Model Complexity (1).pptx
Optimal Model Complexity (1).pptxOptimal Model Complexity (1).pptx
Optimal Model Complexity (1).pptx
 
Identifying and classifying unknown Network Disruption
Identifying and classifying unknown Network DisruptionIdentifying and classifying unknown Network Disruption
Identifying and classifying unknown Network Disruption
 
CASE STUDY: ADMISSION PREDICTION IN ENGINEERING AND TECHNOLOGY COLLEGES
CASE STUDY: ADMISSION PREDICTION IN ENGINEERING AND TECHNOLOGY COLLEGESCASE STUDY: ADMISSION PREDICTION IN ENGINEERING AND TECHNOLOGY COLLEGES
CASE STUDY: ADMISSION PREDICTION IN ENGINEERING AND TECHNOLOGY COLLEGES
 
Data Analytics Using R - Report
Data Analytics Using R - ReportData Analytics Using R - Report
Data Analytics Using R - Report
 
Decision trees
Decision treesDecision trees
Decision trees
 
Top 20 Data Science Interview Questions and Answers in 2023.pptx
Top 20 Data Science Interview Questions and Answers in 2023.pptxTop 20 Data Science Interview Questions and Answers in 2023.pptx
Top 20 Data Science Interview Questions and Answers in 2023.pptx
 
Intrusion Tolerance for Networked Systems through Two-level Feedback Control
Intrusion Tolerance for Networked Systems through Two-level Feedback ControlIntrusion Tolerance for Networked Systems through Two-level Feedback Control
Intrusion Tolerance for Networked Systems through Two-level Feedback Control
 
Presentation1.pptx
Presentation1.pptxPresentation1.pptx
Presentation1.pptx
 

Recently uploaded

State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
kuntobimo2016
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
bopyb
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
g4dpvqap0
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
u86oixdj
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 

Recently uploaded (20)

State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 

Machine Learning Approach.pptx

  • 2. Topics Covered ▶ K Means ▶ Linear Regression ▶ Logistic Regression ▶ Decision Tree ▶ Random Forest 2
  • 3. Machine Learning Machine learning is an application of artificial intelligence (AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed. Unsupervised learning is a type of machine learning algorithm used to draw inferences from datasets consisting of input data without labelled responses. Supervised learning is inferring from labelled training data. Unsupervised and Supervised Learning 3
  • 4. 4
  • 5. K Means ▶ k-means is one of the simplest unsupervised learning algorithms that solve the well known clustering problem. ▶ The algorithm classify a given data set through a certain number of clusters (assume k clusters) fixed apriori. ▶ Algorithmic steps for k-means clustering ▶ Let X = {x1,x2,x3,……..,xn} be the set of data points and V = {v1,v2,…….,vc} be the set of centers. ▶ 1) Randomly select ‘c’ cluster centers. ▶ 2) Calculate the eucledian distance between each data point and cluster centers. ▶ 3) Assign the data point to the cluster center whose distance from the cluster center is minimum of all the cluster centers.. ▶ 4) Recalculate the new cluster center ▶ 5) Recalculate the distance between each data point and new obtained cluster centers. ▶ 6) If no data point was reassigned then stop, otherwise repeat from step 3). 5
  • 6. Optimal Value of K - Elbow Method 6
  • 7. Advantage of K Means ▶ 1) Fast, robust and easier to understand and implement. ▶ 2) Gives best result when data set are distinct or well separated from each other. ▶ 1) The learning algorithm requires apriori specification of the number of cluster centers. ▶ 2) The use of Exclusive Assignment - If there are two highly overlapping data then k-means will not be able to resolve that there are two clusters. ▶ 3) Euclidean distance measures can unequally weight underlying factors. ▶ 4) The learning algorithm provides the local optima of the squared error function. Disadvantage of K Means 7
  • 8. Business Application 1) Document Classification - Cluster documents in multiple categories based on tags, topics, and the content of the document. 2) Customer Segmentation - Clustering helps marketers improve their customer base, work on target areas, and segment customers based on purchase history, interests, or activity monitoring 3) Insurance Fraud Detection - Utilizing past historical data on fraudulent claims, it is possible to isolate new claims based on its proximity to clusters that indicate fraudulent patterns. 4) Cyber Profile Criminals -The idea of cyber profiling is derived from criminal profiles, which provide information on the investigation division to classify the types of criminals who were at the crime scene. 8
  • 9. Linear Regression ▶ Linear regression is the simplest and most widely used statistical technique for predictive modeling. ▶ It basically gives us an equation, where we have our features as independent variables, on which our target variable is dependent upon. ▶ Linear regression equation looks like ▶ Here, we have Y as our dependent variable (Sales), X’s are the independent variables and all thetas are the coefficients. ▶ Coefficients are basically the weights assigned to the features, based on their importance. ▶ Residuals - The difference between the values predicted by us and the observed values 9
  • 11. Explained & Unexplained Variation 11
  • 12. Key Assumptions of Regression The regression has five key assumptions: ▶ Linear relationship ▶ Multivariate normality ▶ No or little multicollinearity ▶ No auto-correlation ▶ Homoscedasticity 12
  • 13. Goodness of Fit ▶ R Square - This statistic indicates the percentage of the variance in the dependent variable that the independent variables explain collectively. ▶ R-squared measures the strength of the relationship between your model and the dependent variable on a convenient 0 – 100% scale. ▶ Adjusted R Square - The adjusted R-squared is a modified version of R-squared that has been adjusted for the number of predictors in the model. ▶ The adjusted R-squared increases only if the new term improves the model more than would be expected by chance. ▶ It decreases when a predictor improves the model by less than expected by chance. 13
  • 14. Business Application ▶ A factory manager can create a statistical model to understand the impact of oven temperature on the working of instrument ▶ A call centre can optimize customer handling operation by understanding the impact of number of customers on wait time ▶ A hotel can forecast the number of staff required as per the demand fluctuate 14
  • 15. Logistic Regression ▶ It’s a classification algorithm ▶ it predicts the probability of occurrence of an event by fitting data to a logit function. ▶ Mathematically, logistic regression estimates a multiple linear regression function defined as: ▶ this is interpreted as taking input log-odds and having output probability. 15
  • 18. ROC -Receiver Operating Characteristics ▶ The ROC curve shows the true positive rate (i.e., recall) against the false positive rate. ▶ The False Positive Rate is the ratio of negative instances that are incorrectly classified as positive. It is equal to one minus the true negative rate. ▶ Specificity - The True Negative rate is called specificity ▶ The ROC curve plots sensitivity (recall) versus 1-specificity 18
  • 19. Decision Tree ▶ Decision tree is a type of supervised learning algorithm (having a pre-defined target variable) that is mostly used in classification problems. It works for both categorical and continuous input and output variables. ▶ Categorical Variable Decision Tree: Decision Tree which has categorical target variable then it called as categorical variable decision tree. Example:- In above scenario of student problem, where the target variable was “Student will play cricket or not” i.e. YES or NO. ▶ Continuous Variable Decision Tree: Decision Tree has continuous target variable then it is called as Continuous Variable Decision Tree. 19
  • 20. Terminology Used in Decision Tree 20
  • 21. How does a tree decide where to split? ▶ Gini Index - if we select two items from a population at random then they must be of same class and probability for this is 1 if population is pure. ▶ It works with categorical target variable “Success” or “Failure”. ▶ It performs only Binary splits ▶ Higher the value of Gini higher the homogeneity. ▶ CART (Classification and Regression Tree) uses Gini method to create binary splits. 21
  • 22. 22
  • 23. Information Gain ▶ How much information is required to explain the disorganisation in the system ▶ Here p and q is probability of success and failure respectively in that node. Entropy 23
  • 24. Advantage of Decision Tree ▶ Easy to Understand ▶ Useful in Data exploration ▶ Less data cleaning required ▶ Over fitting ▶ Not fit for continuous variables Disadvantage of Decision Tree 24
  • 26. Ensemble ▶ Ensemble methods is a machine learning technique that combines several base models in order to produce one optimal predictive model ▶ Bagging - Here idea is to create several subsets of data from training sample chosen randomly with replacement. ▶ Now, each collection of subset data is used to train their decision trees. As a result, we end up with an ensemble of different models. Average of all the predictions from different trees are used ▶ Boosting - In this technique, learners are learned sequentially with early learners fitting simple models to the data and then analyzing data for errors. Types of Ensemble Method 26
  • 27. Random Forest ▶ Random Forest is an extension over bagging. ▶ It takes one extra step where in addition to taking the random subset of data, it also takes the random selection of features rather than using all features to grow trees. ▶ Handles higher dimensionality data very well. ▶ Handles missing values and maintains accuracy for missing data. ▶ Since final prediction is based on the mean predictions from subset trees, it won’t give precise values for the regression model. Advantages of Random Forest Disadvantages of Random Forest 27