SlideShare a Scribd company logo
Master the Art of Analytics
A Simplistic Explainer Series For Citizen Data Scientists
J o u r n e y To w a r d s A u g m e n t e d A n a l y t i c s
Naive Bayes Classification
Introduction with
example
Introduction
• Naive Bayes is a classification algorithm suitable for binary and
multiclass classification
• It’s a supervised classification technique used to classify future
objects by assigning class labels to instances /records using
conditional probability
• In supervised classification , training data are already labeled with a
class.
• For example, if fraudulent transactions are already flagged in transactional
data and if we want to classify future transactions into fraudulent/non
fraudulent then such classification is called supervised.
How it works!
• For each known class value,
• Calculate probabilities for each attribute, conditional on the class
value : P (Ai | C)
• Use the product rule to obtain a joint conditional probability for the
attributes. ∏ni=1 P (Ai | C)) P ( C )
• Once this has been done for all class values, output the class with the
highest probability.
How it works! - Example
• For example, a fruit may be considered to be an
apple if it is red, round, and about 3″ in diameter.
• Let’s say we have data on 1000 pieces of fruit.
The fruit being a Banana, Orange or some Other
fruit and imagine we know 3 features of each
fruit, whether it’s long or not, sweet or not and
yellow or not, as displayed in the table below:
• So from the table above we already know:
• 50% of the fruits are bananas
• 30% are oranges
• 20% are other fruits
How it works! - Example
• Let’s say we’re given the features of a piece of
fruit and we need to predict the fruit class.
• If we’re told that fruit is Long, Sweet and Yellow,
we can classify it using the following approach :
• Probability of a class being “Banana” given the
attributes : “Long, Sweet and Yellow” can be
calculated as below :
• P( Banana|Long,Sweet,Yellow)
• = P( Long|Banana ) * P(Sweet|Banana) *
P(Yellow|Banana) * P(Banana)
• = (400/500) * (350/500) * (450/500) *
(500/1000) = 0.8*0.7*0.9*0.5
• = 0.252
How it works! - Example
• Similarly we can find out probability
• for Orange and Other class and assign
• the class with highest probability.
• Probability of a class being “Orange” given the
attributes : “Long, Sweet and Yellow”
• P(Orange|Long,Sweet,Yellow)
• =(0/300) * (150/300 )* (300/300)*(300/1000)
= 0
• Probability of a class being “Other fruit” given the
attributes : “Long, Sweet and Yellow”
• P(Other|Long,Sweet,Yellow)
• = (100/200) * (150/200) * (50/200) *
(200/1000)
• = 0.5*0.75*0.25*0.2 = 0.01875
• Thus the fruit class identified is banana if the
attributes are “long” , “sweet” and “yellow”
Standard Tuning/Input
Parameters
Note: By default first variable is selected as label and remaining variables as features in spark
Standard Tuning/Input Parameters
Label:
Features:
Lambda/Smoothing
component:
•Modeltype:
By default
Multinomial option
should be selected
as it’s a generic
model which can be
used for binary as
well as multiclass
classification
Provision to select
predictors
/independent
variables
Provision to select
target
variable/predefined
classes
By default this
value should be set
to 1. It is for
smoothing of
categorical
variables in dataset.
It is used primarily
for scenarios when
you expect to see
attributes or data
points in test
dataset which
weren't present in
training data set
Sample UI For
Input/Tuning
Parameters & Output
Sample UI for selecting input parameters:
Select the variables you would
like to use as
predictors/features
Petal length (cm)
Petal width (cm)
Sepal length (cm)
Flower class
21
Select the variable you would
like to use as target variable
Petal length (cm)
Petal width (cm)
Sepal length (cm)
Flower class
Sample UI for tuning parameters :
Model type
Lambda
# Classes in
target variable
Multinomial
1
Three
Categorical
predictors
None
Tuning parameters
These values should be set as default
values
These should be automatically
detected based on number of
predefined categories present in
target variable
This should be automatically
detected. If none of the variables are
categorical in training set then ‘None”
should be displayed as shown.
Petal
length
(cm)
Petal
width
(cm)
Actual
Class
Predicted Class
5.1 3.5 Versicolor Versicolor
4.9 3 Virginica Setosa
4.7 3.2 Setosa Setosa
4.6 3.1 Versicolor Versicolor
5 3.6 Virginica Virginica
5.4 3.9 Versicolor Virginica
4.6 3.4 Versicolor Versicolor
Each instance/record is assigned a class
by the model as shown in the table
below and classification accuracy can be
shown using confusion matrix table as
shown in right:
o The prediction accuracy is useful
criterion for assessing the model
performance.
o Model with prediction accuracy >=
70% is useful.
Output will contain predicted class column and confusion matrix as shown below :
Setosa Versicolor Virginica
Setosa 50
Versicolor 42 8
Virginica 7 43
Prediction accuracy = 90%
Predicted class column
Confusion matrix :
Actual
Predicted
Sample UI for output :
Limitations
A normal distribution is an arrangement of a data set in which most values cluster
in the middle of the range and the rest taper off symmetrically toward either
extreme. It will look like a bell curve as shown in right
Limitations
o Naive Bayes classifier assumes that every
feature/predictor is independent, which
isn’t always the case
o Training dataset should be adequate
enough to represent the entire population
– containing every combination of class
label and attributes
o If you don’t have occurrences of a
class label and a certain attribute
value together in training dataset (e.g.
class="nice", shape="sphere") then
the frequency-based probability
estimate will be zero for that
combination in future data
o This problem happens when we are
drawing training sample from a
population and the drawn sample is
not fully representative of the
population
o It performs well in case of categorical input
variables compared to numerical variables.
For numerical variable, normal distribution
is assumed which is a strong assumption.
Business use cases
General applications
Credit/loan
approval analysis
•Given a list of client’s
transactional
attributes, predict
whether a client will
default or not on a
bank loan
Medical Diagnosis
•Given a list of
symptoms, predict if a
patient has disease
X/Y/Z.
Weather
forecasting
•Based on
temperature,
humidity, pressure
etc. predict if it will be
rainy/sunny/windy
tomorrow
Treatment
effectiveness
analysis
•Based on patient’s
body attributes such
as blood pressure,
sugar, hemoglobin,
name of a drug taken,
type of a treatment
taken etc., check the
likelihood of a disease
being cured.
Fraud analysis
•Based on various bills
submitted by an
employee for
reimbursement of
food , travel , medical
expense etc., predict
the likelihood of an
employee doing fraud.
Use case 1
Business benefit:
•Once classes are assigned, bank will
have a loan applicants’ dataset with
each applicant labeled as
“likely/unlikely to default”.
•Based on this labels , bank can easily
make a decision on whether to give
loan to an applicant or not and if yes
then how much credit limit and
interest rate each applicant is eligible
for based on the amount of risk
involved.
Business problem :
•A bank loans officer wants to predict if
the loan applicant will be a bank
defaulter or non defaulter based on
attributes such as Loan amount ,
Monthly installment, Employment
tenure , Times delinquent, Annual
income, Debt to income ratio etc.
•Here the target variable would be ‘past
default status’ and predicted class
would be containing values ‘yes or no’
representing ‘likely to default/unlikely
to default’ class respectively.
Use case 1: Input dataset
Customer
ID
Loan
amount
Monthly
installment
Annual
income
Debt to
income
ratio
Times
delinquent
Employment
tenure
Past default
status
1039153 21000 701.73 105000 9 5 4 No
1069697 15000 483.38 92000 11 5 2 No
1068120 25600 824.96 110000 10 9 2 No
563175 23000 534.94 80000 9 2 12 No
562842 19750 483.65 57228 11 3 21 Yes
562681 25000 571.78 113000 10 0 9 No
562404 21250 471.2 31008 12 1 12 Yes
700159 14400 448.99 82000 20 6 6 No
696484 10000 241.33 45000 18 8 2 Yes
702598 11700 381.61 45192 20 7 3 Yes
702470 10000 243.29 38000 17 9 7 Yes
702373 4800 144.77 54000 19 8 2 Yes
701975 12500 455.81 43560 15 8 4 Yes
Use case 1 : Output
Customer
ID
Loan
amount
Monthly
installment
Annual
income
Debt to
income
ratio
Times
delinquent
Employment
tenure
Past
default
status
Likelihood
to default
1039153 21000 701.73 105000 9 5 4 No No
1069697 15000 483.38 92000 11 5 2 No No
1068120 25600 824.96 110000 10 9 2 No No
563175 23000 534.94 80000 9 2 12 No No
562842 19750 483.65 57228 11 3 21 Yes No
562681 25000 571.78 113000 10 0 9 No No
562404 21250 471.2 31008 12 1 12 Yes Yes
700159 14400 448.99 82000 20 6 6 No No
696484 10000 241.33 45000 18 8 2 Yes Yes
702598 11700 381.61 45192 20 7 3 Yes Yes
702470 10000 243.29 38000 17 9 7 Yes Yes
702373 4800 144.77 54000 19 8 2 Yes No
701975 12500 455.81 43560 15 8 4 Yes Yes
Each record will have the
predicted class assigned
as shown below (Column
: Likelihood to default)
Use case 1 : Output : Class profiles
 As can be seen in the table above, there are distinctive characteristics of defaulters (Class : Yes ) and non
defaulters ( Class : No ).
 Defaulters have tendency to be delinquent, higher debt to income ratio and lower employment tenure as
compared to non defaulters
 Hence , delinquency , employment tenure and debt to income ratio are the determinant factors when it
comes to classifying loan applicants into likely defaulter/non defaulters
Class(Likely to
default)
Average
loan
amount
Average
monthly
installment
Average
annual
income
Average debt
to income
ratio
Average
times
delinquent
Average
employment
tenure
No 10447.30 304.87 66467.74 9.58 1.69 16.82
Yes 7521.32 227.43 60935.28 16.55 6.91 4.01
Use case 2
Business benefit:
•Given the body profile of a patient and
recent treatments and drugs taken by
him/her , probability of a cure can be
predicted and changes in treatment/drug
can be suggested if required.
Business problem :
•A doctor/ pharmacist wants to predict
the likelihood of a new patient’s disease
being cured/not cured based on various
attributes of a patient such as blood
pressure , hemoglobin level, sugar level ,
name of a drug given to patient, name of
a treatment given to patient etc.
•Here the target variable would be ‘past
cure status’ and predicted class would
contain values ‘yes or no’ meaning ‘prone
to cure/ not prone to cure’ respectively..
Use case 3
Business benefit:
•Based on the symptoms diagnosed, a
doctor or a pharmacist can predict the
most likely disease which a patient is
suffering from and suggest the
appropriate drug/treatment
accordingly.
Business problem :
•Predict the disease based on patient’s
symptoms such as body temperature,
level of blood pressure, weakness ,
nausea, indigestion etc.
•In this case, the target variable would
be ‘past disease detected’ and
predicted class would contain values
such as ‘malaria/typhoid/allergy
rhinitis’ etc. representing name of a
likely disease.
Use case 4
Business problem :
•An accountant/human resource
manager wants to predict the
likelihood of an employee doing fraud
to a company based on various bills
submitted by him/her so far such as
food bill , travel bill , medical bill.
•The target variable in this case would
be ‘past fraud status’ and predicted
class would contain values ‘yes or no’
representing likely fraud and no fraud
respectively.
Business benefit:
•Such classification can prevent a
company from spending unreasonably
on any employee and can in turn save
the company budget by detecting such
fraud beforehand.
Want to Learn
More?
Get in touch with us @
support@Smarten.com
And Do Checkout the Learning section
on
Smarten.com
June 2018

More Related Content

What's hot

Decision Tree In R | Decision Tree Algorithm | Data Science Tutorial | Machin...
Decision Tree In R | Decision Tree Algorithm | Data Science Tutorial | Machin...Decision Tree In R | Decision Tree Algorithm | Data Science Tutorial | Machin...
Decision Tree In R | Decision Tree Algorithm | Data Science Tutorial | Machin...
Simplilearn
 
Decision trees and random forests
Decision trees and random forestsDecision trees and random forests
Decision trees and random forests
Debdoot Sheet
 
Performance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning AlgorithmsPerformance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning Algorithms
Kush Kulshrestha
 
Decision tree and random forest
Decision tree and random forestDecision tree and random forest
Decision tree and random forest
Lippo Group Digital
 
Loss functions (DLAI D4L2 2017 UPC Deep Learning for Artificial Intelligence)
Loss functions (DLAI D4L2 2017 UPC Deep Learning for Artificial Intelligence)Loss functions (DLAI D4L2 2017 UPC Deep Learning for Artificial Intelligence)
Loss functions (DLAI D4L2 2017 UPC Deep Learning for Artificial Intelligence)
Universitat Politècnica de Catalunya
 
Model selection and cross validation techniques
Model selection and cross validation techniquesModel selection and cross validation techniques
Model selection and cross validation techniques
Venkata Reddy Konasani
 
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
Simplilearn
 
Understandig PCA and LDA
Understandig PCA and LDAUnderstandig PCA and LDA
Understandig PCA and LDA
Dr. Syed Hassan Amin
 
Decision tree
Decision treeDecision tree
Decision tree
Varun Jain
 
Decision tree
Decision treeDecision tree
Decision tree
Soujanya V
 
Cross-validation Tutorial: What, how and which?
Cross-validation Tutorial: What, how and which?Cross-validation Tutorial: What, how and which?
Cross-validation Tutorial: What, how and which?
Pradeep Redddy Raamana
 
Customer Clustering For Retail Marketing
Customer Clustering For Retail MarketingCustomer Clustering For Retail Marketing
Customer Clustering For Retail Marketing
Jonathan Sedar
 
knn classification
knn classificationknn classification
knn classification
Akhilesh Joshi
 
Decision tree
Decision treeDecision tree
Decision tree
SEMINARGROOT
 
2.3 bayesian classification
2.3 bayesian classification2.3 bayesian classification
2.3 bayesian classification
Krish_ver2
 
PhD Defense - Example-Dependent Cost-Sensitive Classification
PhD Defense - Example-Dependent Cost-Sensitive ClassificationPhD Defense - Example-Dependent Cost-Sensitive Classification
PhD Defense - Example-Dependent Cost-Sensitive Classification
Alejandro Correa Bahnsen, PhD
 
Default Credit Card Prediction
Default Credit Card PredictionDefault Credit Card Prediction
Default Credit Card Prediction
Alexandre Pinto
 
K means clustering
K means clusteringK means clustering
K means clustering
keshav goyal
 
Machine learning & computer vision
Machine learning & computer visionMachine learning & computer vision
Machine learning & computer vision
Netlight Consulting
 
Random forest algorithm
Random forest algorithmRandom forest algorithm
Random forest algorithm
Rashid Ansari
 

What's hot (20)

Decision Tree In R | Decision Tree Algorithm | Data Science Tutorial | Machin...
Decision Tree In R | Decision Tree Algorithm | Data Science Tutorial | Machin...Decision Tree In R | Decision Tree Algorithm | Data Science Tutorial | Machin...
Decision Tree In R | Decision Tree Algorithm | Data Science Tutorial | Machin...
 
Decision trees and random forests
Decision trees and random forestsDecision trees and random forests
Decision trees and random forests
 
Performance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning AlgorithmsPerformance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning Algorithms
 
Decision tree and random forest
Decision tree and random forestDecision tree and random forest
Decision tree and random forest
 
Loss functions (DLAI D4L2 2017 UPC Deep Learning for Artificial Intelligence)
Loss functions (DLAI D4L2 2017 UPC Deep Learning for Artificial Intelligence)Loss functions (DLAI D4L2 2017 UPC Deep Learning for Artificial Intelligence)
Loss functions (DLAI D4L2 2017 UPC Deep Learning for Artificial Intelligence)
 
Model selection and cross validation techniques
Model selection and cross validation techniquesModel selection and cross validation techniques
Model selection and cross validation techniques
 
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
 
Understandig PCA and LDA
Understandig PCA and LDAUnderstandig PCA and LDA
Understandig PCA and LDA
 
Decision tree
Decision treeDecision tree
Decision tree
 
Decision tree
Decision treeDecision tree
Decision tree
 
Cross-validation Tutorial: What, how and which?
Cross-validation Tutorial: What, how and which?Cross-validation Tutorial: What, how and which?
Cross-validation Tutorial: What, how and which?
 
Customer Clustering For Retail Marketing
Customer Clustering For Retail MarketingCustomer Clustering For Retail Marketing
Customer Clustering For Retail Marketing
 
knn classification
knn classificationknn classification
knn classification
 
Decision tree
Decision treeDecision tree
Decision tree
 
2.3 bayesian classification
2.3 bayesian classification2.3 bayesian classification
2.3 bayesian classification
 
PhD Defense - Example-Dependent Cost-Sensitive Classification
PhD Defense - Example-Dependent Cost-Sensitive ClassificationPhD Defense - Example-Dependent Cost-Sensitive Classification
PhD Defense - Example-Dependent Cost-Sensitive Classification
 
Default Credit Card Prediction
Default Credit Card PredictionDefault Credit Card Prediction
Default Credit Card Prediction
 
K means clustering
K means clusteringK means clustering
K means clustering
 
Machine learning & computer vision
Machine learning & computer visionMachine learning & computer vision
Machine learning & computer vision
 
Random forest algorithm
Random forest algorithmRandom forest algorithm
Random forest algorithm
 

Similar to What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?

What is Hierarchical Clustering and How Can an Organization Use it to Analyze...
What is Hierarchical Clustering and How Can an Organization Use it to Analyze...What is Hierarchical Clustering and How Can an Organization Use it to Analyze...
What is Hierarchical Clustering and How Can an Organization Use it to Analyze...
Smarten Augmented Analytics
 
Application of Machine Learning in Agriculture
Application of Machine  Learning in AgricultureApplication of Machine  Learning in Agriculture
Application of Machine Learning in Agriculture
Aman Vasisht
 
What is KNN Classification and How Can This Analysis Help an Enterprise?
What is KNN Classification and How Can This Analysis Help an Enterprise?What is KNN Classification and How Can This Analysis Help an Enterprise?
What is KNN Classification and How Can This Analysis Help an Enterprise?
Smarten Augmented Analytics
 
Anomaly detection Workshop slides
Anomaly detection Workshop slidesAnomaly detection Workshop slides
Anomaly detection Workshop slides
QuantUniversity
 
The 8 Step Data Mining Process
The 8 Step Data Mining ProcessThe 8 Step Data Mining Process
The 8 Step Data Mining Process
Marc Berman
 
Statistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptxStatistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptx
rajalakshmi5921
 
CHAPTER 11 LOGISTIC REGRESSION.pptx
CHAPTER 11 LOGISTIC REGRESSION.pptxCHAPTER 11 LOGISTIC REGRESSION.pptx
CHAPTER 11 LOGISTIC REGRESSION.pptx
UmaDeviAnanth
 
What is SVM Classification Analysis and How Can It Benefit Business Analytics?
What is SVM Classification Analysis and How Can It Benefit Business Analytics?What is SVM Classification Analysis and How Can It Benefit Business Analytics?
What is SVM Classification Analysis and How Can It Benefit Business Analytics?
Smarten Augmented Analytics
 
customer_profiling_based_on_fuzzy_principals_linkedin
customer_profiling_based_on_fuzzy_principals_linkedincustomer_profiling_based_on_fuzzy_principals_linkedin
customer_profiling_based_on_fuzzy_principals_linkedinAsoka Korale
 
data analysis techniques and statistical softwares
data analysis techniques and statistical softwaresdata analysis techniques and statistical softwares
data analysis techniques and statistical softwares
Dr.ammara khakwani
 
What Is Random Forest Classification And How Can It Help Your Business?
What Is Random Forest Classification And How Can It Help Your Business?What Is Random Forest Classification And How Can It Help Your Business?
What Is Random Forest Classification And How Can It Help Your Business?
Smarten Augmented Analytics
 
What is Binary Logistic Regression Classification and How is it Used in Analy...
What is Binary Logistic Regression Classification and How is it Used in Analy...What is Binary Logistic Regression Classification and How is it Used in Analy...
What is Binary Logistic Regression Classification and How is it Used in Analy...
Smarten Augmented Analytics
 
Supervised Learning-Unit 3.pptx
Supervised Learning-Unit 3.pptxSupervised Learning-Unit 3.pptx
Supervised Learning-Unit 3.pptx
nehashanbhag5
 
Machine_Learning.pptx
Machine_Learning.pptxMachine_Learning.pptx
Machine_Learning.pptx
VickyKumar131533
 
Module 4: Model Selection and Evaluation
Module 4: Model Selection and EvaluationModule 4: Model Selection and Evaluation
Module 4: Model Selection and Evaluation
Sara Hooker
 
Discriminant analysis using spss
Discriminant analysis using spssDiscriminant analysis using spss
Discriminant analysis using spss
Dr Nisha Arora
 
Barga Data Science lecture 10
Barga Data Science lecture 10Barga Data Science lecture 10
Barga Data Science lecture 10
Roger Barga
 
scrib.pptx
scrib.pptxscrib.pptx
scrib.pptx
BhavanaMU012
 
Intro to data science
Intro to data scienceIntro to data science
Intro to data science
ANURAG SINGH
 
Introduction To Data Science Using R
Introduction To Data Science Using RIntroduction To Data Science Using R
Introduction To Data Science Using R
ANURAG SINGH
 

Similar to What is Naïve Bayes Classification and How is it Used for Enterprise Analysis? (20)

What is Hierarchical Clustering and How Can an Organization Use it to Analyze...
What is Hierarchical Clustering and How Can an Organization Use it to Analyze...What is Hierarchical Clustering and How Can an Organization Use it to Analyze...
What is Hierarchical Clustering and How Can an Organization Use it to Analyze...
 
Application of Machine Learning in Agriculture
Application of Machine  Learning in AgricultureApplication of Machine  Learning in Agriculture
Application of Machine Learning in Agriculture
 
What is KNN Classification and How Can This Analysis Help an Enterprise?
What is KNN Classification and How Can This Analysis Help an Enterprise?What is KNN Classification and How Can This Analysis Help an Enterprise?
What is KNN Classification and How Can This Analysis Help an Enterprise?
 
Anomaly detection Workshop slides
Anomaly detection Workshop slidesAnomaly detection Workshop slides
Anomaly detection Workshop slides
 
The 8 Step Data Mining Process
The 8 Step Data Mining ProcessThe 8 Step Data Mining Process
The 8 Step Data Mining Process
 
Statistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptxStatistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptx
 
CHAPTER 11 LOGISTIC REGRESSION.pptx
CHAPTER 11 LOGISTIC REGRESSION.pptxCHAPTER 11 LOGISTIC REGRESSION.pptx
CHAPTER 11 LOGISTIC REGRESSION.pptx
 
What is SVM Classification Analysis and How Can It Benefit Business Analytics?
What is SVM Classification Analysis and How Can It Benefit Business Analytics?What is SVM Classification Analysis and How Can It Benefit Business Analytics?
What is SVM Classification Analysis and How Can It Benefit Business Analytics?
 
customer_profiling_based_on_fuzzy_principals_linkedin
customer_profiling_based_on_fuzzy_principals_linkedincustomer_profiling_based_on_fuzzy_principals_linkedin
customer_profiling_based_on_fuzzy_principals_linkedin
 
data analysis techniques and statistical softwares
data analysis techniques and statistical softwaresdata analysis techniques and statistical softwares
data analysis techniques and statistical softwares
 
What Is Random Forest Classification And How Can It Help Your Business?
What Is Random Forest Classification And How Can It Help Your Business?What Is Random Forest Classification And How Can It Help Your Business?
What Is Random Forest Classification And How Can It Help Your Business?
 
What is Binary Logistic Regression Classification and How is it Used in Analy...
What is Binary Logistic Regression Classification and How is it Used in Analy...What is Binary Logistic Regression Classification and How is it Used in Analy...
What is Binary Logistic Regression Classification and How is it Used in Analy...
 
Supervised Learning-Unit 3.pptx
Supervised Learning-Unit 3.pptxSupervised Learning-Unit 3.pptx
Supervised Learning-Unit 3.pptx
 
Machine_Learning.pptx
Machine_Learning.pptxMachine_Learning.pptx
Machine_Learning.pptx
 
Module 4: Model Selection and Evaluation
Module 4: Model Selection and EvaluationModule 4: Model Selection and Evaluation
Module 4: Model Selection and Evaluation
 
Discriminant analysis using spss
Discriminant analysis using spssDiscriminant analysis using spss
Discriminant analysis using spss
 
Barga Data Science lecture 10
Barga Data Science lecture 10Barga Data Science lecture 10
Barga Data Science lecture 10
 
scrib.pptx
scrib.pptxscrib.pptx
scrib.pptx
 
Intro to data science
Intro to data scienceIntro to data science
Intro to data science
 
Introduction To Data Science Using R
Introduction To Data Science Using RIntroduction To Data Science Using R
Introduction To Data Science Using R
 

More from Smarten Augmented Analytics

Crime Type Prediction - Augmented Analytics Use Case – Smarten
Crime Type Prediction - Augmented Analytics Use Case – SmartenCrime Type Prediction - Augmented Analytics Use Case – Smarten
Crime Type Prediction - Augmented Analytics Use Case – Smarten
Smarten Augmented Analytics
 
What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...
What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...
What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...
Smarten Augmented Analytics
 
What Is Generalized Linear Regression with Gaussian Distribution And How Can ...
What Is Generalized Linear Regression with Gaussian Distribution And How Can ...What Is Generalized Linear Regression with Gaussian Distribution And How Can ...
What Is Generalized Linear Regression with Gaussian Distribution And How Can ...
Smarten Augmented Analytics
 
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
Smarten Augmented Analytics
 
Students' Academic Performance Predictive Analytics Use Case – Smarten
Students' Academic Performance Predictive Analytics Use Case – SmartenStudents' Academic Performance Predictive Analytics Use Case – Smarten
Students' Academic Performance Predictive Analytics Use Case – Smarten
Smarten Augmented Analytics
 
Random Forest Regression Analysis Reveals Impact of Variables on Target Values
Random Forest Regression Analysis Reveals Impact of Variables on Target Values  Random Forest Regression Analysis Reveals Impact of Variables on Target Values
Random Forest Regression Analysis Reveals Impact of Variables on Target Values
Smarten Augmented Analytics
 
Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...
Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...
Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...
Smarten Augmented Analytics
 
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
Smarten Augmented Analytics
 
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
Smarten Augmented Analytics
 
Fraud Mitigation Predictive Analytics Use Case – Smarten
Fraud Mitigation Predictive Analytics Use Case – SmartenFraud Mitigation Predictive Analytics Use Case – Smarten
Fraud Mitigation Predictive Analytics Use Case – Smarten
Smarten Augmented Analytics
 
Quality Control Predictive Analytics Use Case - Smarten
Quality Control Predictive Analytics Use Case - SmartenQuality Control Predictive Analytics Use Case - Smarten
Quality Control Predictive Analytics Use Case - Smarten
Smarten Augmented Analytics
 
Machine Maintenance Management Predictive Analytics Use Case - Smarten
Machine Maintenance Management Predictive Analytics Use Case - SmartenMachine Maintenance Management Predictive Analytics Use Case - Smarten
Machine Maintenance Management Predictive Analytics Use Case - Smarten
Smarten Augmented Analytics
 
Predictive Analytics Using External Data Augmented Analytics Use Case - Smarten
Predictive Analytics Using External Data Augmented Analytics Use Case - SmartenPredictive Analytics Using External Data Augmented Analytics Use Case - Smarten
Predictive Analytics Using External Data Augmented Analytics Use Case - Smarten
Smarten Augmented Analytics
 
Marketing Optimization Augmented Analytics Use Cases - Smarten
Marketing Optimization Augmented Analytics Use Cases - SmartenMarketing Optimization Augmented Analytics Use Cases - Smarten
Marketing Optimization Augmented Analytics Use Cases - Smarten
Smarten Augmented Analytics
 
Human Resource Attrition Augmented Analytics Use Case - Smarten
Human Resource Attrition Augmented Analytics Use Case - SmartenHuman Resource Attrition Augmented Analytics Use Case - Smarten
Human Resource Attrition Augmented Analytics Use Case - Smarten
Smarten Augmented Analytics
 
Customer Targeting Augmented Analytics Use Case - Smarten
Customer Targeting Augmented Analytics Use Case - SmartenCustomer Targeting Augmented Analytics Use Case - Smarten
Customer Targeting Augmented Analytics Use Case - Smarten
Smarten Augmented Analytics
 
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
Smarten Augmented Analytics
 
What is the Independent Samples T Test Method of Analysis and How Can it Bene...
What is the Independent Samples T Test Method of Analysis and How Can it Bene...What is the Independent Samples T Test Method of Analysis and How Can it Bene...
What is the Independent Samples T Test Method of Analysis and How Can it Bene...
Smarten Augmented Analytics
 
What Are Simple Random Sampling and Stratified Random Sampling Analytical Tec...
What Are Simple Random Sampling and Stratified Random Sampling Analytical Tec...What Are Simple Random Sampling and Stratified Random Sampling Analytical Tec...
What Are Simple Random Sampling and Stratified Random Sampling Analytical Tec...
Smarten Augmented Analytics
 
What is the Paired Sample T Test and How is it Beneficial to Business Analysis?
What is the Paired Sample T Test and How is it Beneficial to Business Analysis?What is the Paired Sample T Test and How is it Beneficial to Business Analysis?
What is the Paired Sample T Test and How is it Beneficial to Business Analysis?
Smarten Augmented Analytics
 

More from Smarten Augmented Analytics (20)

Crime Type Prediction - Augmented Analytics Use Case – Smarten
Crime Type Prediction - Augmented Analytics Use Case – SmartenCrime Type Prediction - Augmented Analytics Use Case – Smarten
Crime Type Prediction - Augmented Analytics Use Case – Smarten
 
What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...
What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...
What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...
 
What Is Generalized Linear Regression with Gaussian Distribution And How Can ...
What Is Generalized Linear Regression with Gaussian Distribution And How Can ...What Is Generalized Linear Regression with Gaussian Distribution And How Can ...
What Is Generalized Linear Regression with Gaussian Distribution And How Can ...
 
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
 
Students' Academic Performance Predictive Analytics Use Case – Smarten
Students' Academic Performance Predictive Analytics Use Case – SmartenStudents' Academic Performance Predictive Analytics Use Case – Smarten
Students' Academic Performance Predictive Analytics Use Case – Smarten
 
Random Forest Regression Analysis Reveals Impact of Variables on Target Values
Random Forest Regression Analysis Reveals Impact of Variables on Target Values  Random Forest Regression Analysis Reveals Impact of Variables on Target Values
Random Forest Regression Analysis Reveals Impact of Variables on Target Values
 
Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...
Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...
Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...
 
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
 
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
 
Fraud Mitigation Predictive Analytics Use Case – Smarten
Fraud Mitigation Predictive Analytics Use Case – SmartenFraud Mitigation Predictive Analytics Use Case – Smarten
Fraud Mitigation Predictive Analytics Use Case – Smarten
 
Quality Control Predictive Analytics Use Case - Smarten
Quality Control Predictive Analytics Use Case - SmartenQuality Control Predictive Analytics Use Case - Smarten
Quality Control Predictive Analytics Use Case - Smarten
 
Machine Maintenance Management Predictive Analytics Use Case - Smarten
Machine Maintenance Management Predictive Analytics Use Case - SmartenMachine Maintenance Management Predictive Analytics Use Case - Smarten
Machine Maintenance Management Predictive Analytics Use Case - Smarten
 
Predictive Analytics Using External Data Augmented Analytics Use Case - Smarten
Predictive Analytics Using External Data Augmented Analytics Use Case - SmartenPredictive Analytics Using External Data Augmented Analytics Use Case - Smarten
Predictive Analytics Using External Data Augmented Analytics Use Case - Smarten
 
Marketing Optimization Augmented Analytics Use Cases - Smarten
Marketing Optimization Augmented Analytics Use Cases - SmartenMarketing Optimization Augmented Analytics Use Cases - Smarten
Marketing Optimization Augmented Analytics Use Cases - Smarten
 
Human Resource Attrition Augmented Analytics Use Case - Smarten
Human Resource Attrition Augmented Analytics Use Case - SmartenHuman Resource Attrition Augmented Analytics Use Case - Smarten
Human Resource Attrition Augmented Analytics Use Case - Smarten
 
Customer Targeting Augmented Analytics Use Case - Smarten
Customer Targeting Augmented Analytics Use Case - SmartenCustomer Targeting Augmented Analytics Use Case - Smarten
Customer Targeting Augmented Analytics Use Case - Smarten
 
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
 
What is the Independent Samples T Test Method of Analysis and How Can it Bene...
What is the Independent Samples T Test Method of Analysis and How Can it Bene...What is the Independent Samples T Test Method of Analysis and How Can it Bene...
What is the Independent Samples T Test Method of Analysis and How Can it Bene...
 
What Are Simple Random Sampling and Stratified Random Sampling Analytical Tec...
What Are Simple Random Sampling and Stratified Random Sampling Analytical Tec...What Are Simple Random Sampling and Stratified Random Sampling Analytical Tec...
What Are Simple Random Sampling and Stratified Random Sampling Analytical Tec...
 
What is the Paired Sample T Test and How is it Beneficial to Business Analysis?
What is the Paired Sample T Test and How is it Beneficial to Business Analysis?What is the Paired Sample T Test and How is it Beneficial to Business Analysis?
What is the Paired Sample T Test and How is it Beneficial to Business Analysis?
 

Recently uploaded

Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
timtebeek1
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOMLORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
lorraineandreiamcidl
 
Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
Octavian Nadolu
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
Drona Infotech
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
Aftab Hussain
 
Using Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional SafetyUsing Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional Safety
Ayan Halder
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
Paco van Beckhoven
 
Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
Deuglo Infosystem Pvt Ltd
 
APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
Boni García
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
Rakesh Kumar R
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
Neo4j
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
Donna Lenk
 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Mind IT Systems
 
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket ManagementUtilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate
 
Transform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR SolutionsTransform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR Solutions
TheSMSPoint
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
Max Andersen
 
openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
Shane Coughlan
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
rickgrimesss22
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
Aftab Hussain
 
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppAI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
Google
 

Recently uploaded (20)

Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOMLORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
 
Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
 
Using Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional SafetyUsing Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional Safety
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
 
Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
 
APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
 
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket ManagementUtilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
 
Transform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR SolutionsTransform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR Solutions
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
 
openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
 
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppAI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
 

What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?

  • 1. Master the Art of Analytics A Simplistic Explainer Series For Citizen Data Scientists J o u r n e y To w a r d s A u g m e n t e d A n a l y t i c s
  • 4. Introduction • Naive Bayes is a classification algorithm suitable for binary and multiclass classification • It’s a supervised classification technique used to classify future objects by assigning class labels to instances /records using conditional probability • In supervised classification , training data are already labeled with a class. • For example, if fraudulent transactions are already flagged in transactional data and if we want to classify future transactions into fraudulent/non fraudulent then such classification is called supervised.
  • 5. How it works! • For each known class value, • Calculate probabilities for each attribute, conditional on the class value : P (Ai | C) • Use the product rule to obtain a joint conditional probability for the attributes. ∏ni=1 P (Ai | C)) P ( C ) • Once this has been done for all class values, output the class with the highest probability.
  • 6. How it works! - Example • For example, a fruit may be considered to be an apple if it is red, round, and about 3″ in diameter. • Let’s say we have data on 1000 pieces of fruit. The fruit being a Banana, Orange or some Other fruit and imagine we know 3 features of each fruit, whether it’s long or not, sweet or not and yellow or not, as displayed in the table below: • So from the table above we already know: • 50% of the fruits are bananas • 30% are oranges • 20% are other fruits
  • 7. How it works! - Example • Let’s say we’re given the features of a piece of fruit and we need to predict the fruit class. • If we’re told that fruit is Long, Sweet and Yellow, we can classify it using the following approach : • Probability of a class being “Banana” given the attributes : “Long, Sweet and Yellow” can be calculated as below : • P( Banana|Long,Sweet,Yellow) • = P( Long|Banana ) * P(Sweet|Banana) * P(Yellow|Banana) * P(Banana) • = (400/500) * (350/500) * (450/500) * (500/1000) = 0.8*0.7*0.9*0.5 • = 0.252
  • 8. How it works! - Example • Similarly we can find out probability • for Orange and Other class and assign • the class with highest probability. • Probability of a class being “Orange” given the attributes : “Long, Sweet and Yellow” • P(Orange|Long,Sweet,Yellow) • =(0/300) * (150/300 )* (300/300)*(300/1000) = 0 • Probability of a class being “Other fruit” given the attributes : “Long, Sweet and Yellow” • P(Other|Long,Sweet,Yellow) • = (100/200) * (150/200) * (50/200) * (200/1000) • = 0.5*0.75*0.25*0.2 = 0.01875 • Thus the fruit class identified is banana if the attributes are “long” , “sweet” and “yellow”
  • 10. Note: By default first variable is selected as label and remaining variables as features in spark Standard Tuning/Input Parameters Label: Features: Lambda/Smoothing component: •Modeltype: By default Multinomial option should be selected as it’s a generic model which can be used for binary as well as multiclass classification Provision to select predictors /independent variables Provision to select target variable/predefined classes By default this value should be set to 1. It is for smoothing of categorical variables in dataset. It is used primarily for scenarios when you expect to see attributes or data points in test dataset which weren't present in training data set
  • 12. Sample UI for selecting input parameters: Select the variables you would like to use as predictors/features Petal length (cm) Petal width (cm) Sepal length (cm) Flower class 21 Select the variable you would like to use as target variable Petal length (cm) Petal width (cm) Sepal length (cm) Flower class
  • 13. Sample UI for tuning parameters : Model type Lambda # Classes in target variable Multinomial 1 Three Categorical predictors None Tuning parameters These values should be set as default values These should be automatically detected based on number of predefined categories present in target variable This should be automatically detected. If none of the variables are categorical in training set then ‘None” should be displayed as shown.
  • 14. Petal length (cm) Petal width (cm) Actual Class Predicted Class 5.1 3.5 Versicolor Versicolor 4.9 3 Virginica Setosa 4.7 3.2 Setosa Setosa 4.6 3.1 Versicolor Versicolor 5 3.6 Virginica Virginica 5.4 3.9 Versicolor Virginica 4.6 3.4 Versicolor Versicolor Each instance/record is assigned a class by the model as shown in the table below and classification accuracy can be shown using confusion matrix table as shown in right: o The prediction accuracy is useful criterion for assessing the model performance. o Model with prediction accuracy >= 70% is useful. Output will contain predicted class column and confusion matrix as shown below : Setosa Versicolor Virginica Setosa 50 Versicolor 42 8 Virginica 7 43 Prediction accuracy = 90% Predicted class column Confusion matrix : Actual Predicted Sample UI for output :
  • 16. A normal distribution is an arrangement of a data set in which most values cluster in the middle of the range and the rest taper off symmetrically toward either extreme. It will look like a bell curve as shown in right Limitations o Naive Bayes classifier assumes that every feature/predictor is independent, which isn’t always the case o Training dataset should be adequate enough to represent the entire population – containing every combination of class label and attributes o If you don’t have occurrences of a class label and a certain attribute value together in training dataset (e.g. class="nice", shape="sphere") then the frequency-based probability estimate will be zero for that combination in future data o This problem happens when we are drawing training sample from a population and the drawn sample is not fully representative of the population o It performs well in case of categorical input variables compared to numerical variables. For numerical variable, normal distribution is assumed which is a strong assumption.
  • 18. General applications Credit/loan approval analysis •Given a list of client’s transactional attributes, predict whether a client will default or not on a bank loan Medical Diagnosis •Given a list of symptoms, predict if a patient has disease X/Y/Z. Weather forecasting •Based on temperature, humidity, pressure etc. predict if it will be rainy/sunny/windy tomorrow Treatment effectiveness analysis •Based on patient’s body attributes such as blood pressure, sugar, hemoglobin, name of a drug taken, type of a treatment taken etc., check the likelihood of a disease being cured. Fraud analysis •Based on various bills submitted by an employee for reimbursement of food , travel , medical expense etc., predict the likelihood of an employee doing fraud.
  • 19. Use case 1 Business benefit: •Once classes are assigned, bank will have a loan applicants’ dataset with each applicant labeled as “likely/unlikely to default”. •Based on this labels , bank can easily make a decision on whether to give loan to an applicant or not and if yes then how much credit limit and interest rate each applicant is eligible for based on the amount of risk involved. Business problem : •A bank loans officer wants to predict if the loan applicant will be a bank defaulter or non defaulter based on attributes such as Loan amount , Monthly installment, Employment tenure , Times delinquent, Annual income, Debt to income ratio etc. •Here the target variable would be ‘past default status’ and predicted class would be containing values ‘yes or no’ representing ‘likely to default/unlikely to default’ class respectively.
  • 20. Use case 1: Input dataset Customer ID Loan amount Monthly installment Annual income Debt to income ratio Times delinquent Employment tenure Past default status 1039153 21000 701.73 105000 9 5 4 No 1069697 15000 483.38 92000 11 5 2 No 1068120 25600 824.96 110000 10 9 2 No 563175 23000 534.94 80000 9 2 12 No 562842 19750 483.65 57228 11 3 21 Yes 562681 25000 571.78 113000 10 0 9 No 562404 21250 471.2 31008 12 1 12 Yes 700159 14400 448.99 82000 20 6 6 No 696484 10000 241.33 45000 18 8 2 Yes 702598 11700 381.61 45192 20 7 3 Yes 702470 10000 243.29 38000 17 9 7 Yes 702373 4800 144.77 54000 19 8 2 Yes 701975 12500 455.81 43560 15 8 4 Yes
  • 21. Use case 1 : Output Customer ID Loan amount Monthly installment Annual income Debt to income ratio Times delinquent Employment tenure Past default status Likelihood to default 1039153 21000 701.73 105000 9 5 4 No No 1069697 15000 483.38 92000 11 5 2 No No 1068120 25600 824.96 110000 10 9 2 No No 563175 23000 534.94 80000 9 2 12 No No 562842 19750 483.65 57228 11 3 21 Yes No 562681 25000 571.78 113000 10 0 9 No No 562404 21250 471.2 31008 12 1 12 Yes Yes 700159 14400 448.99 82000 20 6 6 No No 696484 10000 241.33 45000 18 8 2 Yes Yes 702598 11700 381.61 45192 20 7 3 Yes Yes 702470 10000 243.29 38000 17 9 7 Yes Yes 702373 4800 144.77 54000 19 8 2 Yes No 701975 12500 455.81 43560 15 8 4 Yes Yes Each record will have the predicted class assigned as shown below (Column : Likelihood to default)
  • 22. Use case 1 : Output : Class profiles  As can be seen in the table above, there are distinctive characteristics of defaulters (Class : Yes ) and non defaulters ( Class : No ).  Defaulters have tendency to be delinquent, higher debt to income ratio and lower employment tenure as compared to non defaulters  Hence , delinquency , employment tenure and debt to income ratio are the determinant factors when it comes to classifying loan applicants into likely defaulter/non defaulters Class(Likely to default) Average loan amount Average monthly installment Average annual income Average debt to income ratio Average times delinquent Average employment tenure No 10447.30 304.87 66467.74 9.58 1.69 16.82 Yes 7521.32 227.43 60935.28 16.55 6.91 4.01
  • 23. Use case 2 Business benefit: •Given the body profile of a patient and recent treatments and drugs taken by him/her , probability of a cure can be predicted and changes in treatment/drug can be suggested if required. Business problem : •A doctor/ pharmacist wants to predict the likelihood of a new patient’s disease being cured/not cured based on various attributes of a patient such as blood pressure , hemoglobin level, sugar level , name of a drug given to patient, name of a treatment given to patient etc. •Here the target variable would be ‘past cure status’ and predicted class would contain values ‘yes or no’ meaning ‘prone to cure/ not prone to cure’ respectively..
  • 24. Use case 3 Business benefit: •Based on the symptoms diagnosed, a doctor or a pharmacist can predict the most likely disease which a patient is suffering from and suggest the appropriate drug/treatment accordingly. Business problem : •Predict the disease based on patient’s symptoms such as body temperature, level of blood pressure, weakness , nausea, indigestion etc. •In this case, the target variable would be ‘past disease detected’ and predicted class would contain values such as ‘malaria/typhoid/allergy rhinitis’ etc. representing name of a likely disease.
  • 25. Use case 4 Business problem : •An accountant/human resource manager wants to predict the likelihood of an employee doing fraud to a company based on various bills submitted by him/her so far such as food bill , travel bill , medical bill. •The target variable in this case would be ‘past fraud status’ and predicted class would contain values ‘yes or no’ representing likely fraud and no fraud respectively. Business benefit: •Such classification can prevent a company from spending unreasonably on any employee and can in turn save the company budget by detecting such fraud beforehand.
  • 26. Want to Learn More? Get in touch with us @ support@Smarten.com And Do Checkout the Learning section on Smarten.com June 2018