SlideShare a Scribd company logo
INTERNATIONAL JOURNAL ON ORANGE TECHNOLOGIES (IJOT) e-ISSN: 2615-814
Volume: 01 Issue: 02 | Nov-Dec 2019 www.researchparks.org p-ISSN: 2615-7071
© 2019, IJOT | Research Parks Publishing (IDEAS Lab) | Page 27
Naïve Bayes Machine Learning Classification with R Programming: A
case study of binary data sets
Yagyanath Rimal1
1School of Engineering, Pokhara University, Nepal
E-mail: rimal.yagya@pu.edu.np
---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract - This analytical review paper clearly explains Naïve Bayes machine learning techniques for simple probabilistic
classification based on bayes theorem with the assumption of independence between the characteristics using r programming.
Although there is large gap between which algorithm is suitable for data analysis when there was large categorical variable to
be predict the value in research data. The model is trained in the training data set to make predictions on the test data sets for
the implementation of the Naïve Bayes classification. The uniqueness of the technique is that gets new information and tries to
make a better forecast by considering the new evidence when the input variable is of largely categorical in nature that is quite
similar to how our human mind works while selecting proper judgement from various alternative of choices and can be applied
in the neuronal network of the human brain does using r programming. Here researcher takes binary.csv data sets of 400
observations of 4 dependent attributes of educational data sets. Admit is dependent variable of gre, score gpa and rank of
previous grade which ultimately determine whether student will be admitted or not for next program. Initially the gra and gpa
variables has 0.36 percent significant in the association with rank categorical variable. The box plot and density plot
demonstrate the data overlap between admitted and not admitted data sets. The naïve Bayes classification model classify the
large data with 0.68 percent for not admitted where as 0.31 percent were admitted. The confusion matrix, and the prediction
were calculated with 0.68 percent accuracy when 95 percent confidence interval. Similarly, the training accuracy is increased
from 29 percent to 32 percent when naïve Bayes algorithm method as use kernel is equal to TRUE that ultimately decrease
misclassification errors in the binary data sets.
Key Words: Naive Bayes Classifier, Supervised Learning.
1. INTRODUCTION
Machine learning is the study of algorithms that can learn from input data and therefore predict data for future
analysis. They use a complex model and an algorithm they learn by themselves to predict, which in commercial
use is known as predictive analysis [1]. Based on the statistical data of the statisticians in the form of variables,
the data were processed through traditional programming [2]. Google, Facebook and other social networks used
to a large extent the classification algorithm of automatic design pages. Although there were two types of
supervised learning and an unsupervised algorithm [3]. The supervised learning mainly focuses on the
classification of the set of identification categories that support the set of training data that contain observations
that, at a certain point, classify the variables of the research data into different classes. It is a classification
technique based on Bayes' theorem with a hypothesis of independence among predictors [4]. In simple terms, a
Naive Bayes classifier assumes that the presence of a particular attribute in a class is not related to the presence
of any other implicit characteristic that the evidence is based on evidence [5]. The naive Bayes classifier has its
fundamental pillar from the concept of Bayes' theorem explained by probability theory.
INTERNATIONAL JOURNAL ON ORANGE TECHNOLOGIES (IJOT) e-ISSN: 2615-814
Volume: 01 Issue: 02 | Nov-Dec 2019 www.researchparks.org p-ISSN: 2615-7071
© 2019, IJOT | Research Parks Publishing (IDEAS Lab) | Page 28
The probability is the possibility that an event of the various cases occurs. The probability can be related to our
normal life and helps us solve many real problems. There is a probability that a single event is calculated as the
proportion of cases in which that particular event occurs. Similarly, we have a probability of a group of events,
calculated as the proportion of cases in which the group of events occurs together [6]. In the same way, the
individual probability was also calculated, so all the yes and the probabilities were calculated first, then all the
yes forecasts and not multiplied, so these records were divided with a normalized set of yes data and a total of n.
Finally, consider the probability that yes and no, if yes, it is greater than no case, then yes, otherwise there will
be no forecasting. For the simplest example, the football players' dataset can play in different seasons like windy,
moderate, summer, winter, summer, monsoon, like many categories where there were many conditions of
probability of game habits that can having many conditions, there could also be a combination of several cases
When
calculating the probability table, the entire frequency was calculated in each case and predicted its probability
[3].
Therefore, the probability of occurring in the summer as P (c | x) = P (Yes) | Summer = P (Summer | Yes) * P (Yes)
/P (Summer) = 0.33 * 0.64 / 0.36 = 0.62.i.e (2/9) * (6/9) *(9/14) / (5/14) * (7/14) * (8/14) = 0.622 which is
always compared with the threshold probability test of 0.5 for the forecast. Therefore, the naive Bayes
classification tries to predict the situation in many alternative cases as the wind itself, but not the sunny cases
therefore: P (C | X) = P (x | c) * P (c) / P ( x) the product of probability and the previous probability of the class
divided by the previous probability of the predictor.Here, P (c) is a previous probability that describes the degree
to which we believe that the accuracy of the model describes reality based on all our previous information.
Similarly, p (x) is a normalization constant that causes the subsequent density to be integrated into one. The
posterior probability P (C | X) which represents the degree to which we believe that a given model accurately
describes the situation, given the available data and all our previous information. Similarly, P (x | c) is the
probability that describes how well the model predicts data [7]. Machine learning is a category of algorithms that
allows software applications to be more precise in predicting results without being explicitly programmed. The
basic premise of machine learning is to create algorithms that can receive input data and use statistical analysis
to predict an output, while outputs are updated as soon as new data become available. The Naive Bayes
classification is an important tool related to the analysis of large data or work in the field of data science [8]. R is
a free software environment for statistical and graphical processing and is widely used by the academic and
industrial world. The Naive Bayes classifier is a simple classifier based on the famous Bayes theorem. Despite its
simplicity, it has remained a popular choice for text classification. The Naive Bayes classifier applies the well-
known Bayes theorem for conditional probability. In the simplest form for the event AA and BB, the Bayes
theorem relates two conditional probabilities in the following way: P (A∩B) = P (A, B) = P (A) P (B | A) = P (B) P
(A | B) P (A∩B) = P (A, B) = P (A) P (B | A) = P (B) P (A | B)⟹P (B | A) = P (B) P (A | B) P (A) ⟹P (B | A) = P (B) P
(A | B) P (A)
play Likelihood play
Yes No Yes No ∑/Tot
P(x|c) Summer yes
3/9=0.33
Summer 3 2
Season
Summer 3/9 2/5 5/14 P(x) Summer 5/14=0.36
Monsoon 4 0 Monsoon 4/9 0/5 4/14
Winter 2 3 Winter 2/9 3/5 5/14
9/14
P(c ) yes 9/14=0.64
INTERNATIONAL JOURNAL ON ORANGE TECHNOLOGIES (IJOT) e-ISSN: 2615-814
Volume: 01 Issue: 02 | Nov-Dec 2019 www.researchparks.org p-ISSN: 2615-7071
© 2019, IJOT | Research Parks Publishing (IDEAS Lab) | Page 29
In a classification problem, we have some predictors (characteristics / covariates / independent variables) and
a result (target / class / dependent variable) [9]. Each observation has some values for the class of predictors.
From these predictors and associated classes, we want to learn so that, if feature values are provided, we can
predict the class. Now in Naive Bayes, the algorithm evaluates a probability for each class, when the predictor
values are automatically assigned, we can go to the class, which is more likely. The previous expression is quite
long and also including several conditional probabilities. But with the assumption of conditional independence,
this long expression can be reduced to a very simple form [10]. The assumption of conditional independence is
that, given a class, the predictor / characteristic values are independent of each other. There is no correlation
between the characteristics of a certain class [11] at any time. Mathematically, if the event AA and BB are
independent conditioned by the CC event, then the following is true: P (A | B, C) = P (A | C) P (A | B, C) = P (A | C).
Now we are trying to get the probability for certain functionality values. The denominator above is a constant
term for certain characteristics. Therefore, it will be sufficient to consider only the part of the numerator to
compare the probabilities of different classes conditioned in values of fixed characteristics. That is, we evaluate
the part of the numerator for every possible union value and in the end, we vote for the one with the highest
value. we need the conditional probability distributions, f (Xj | Ck) f (Xj | Ck) for j = 1,2, ..., nj = 1,2, ..., n. If XjXj is
continuous, normality is a common hypothesis. If XkXk is discrete, a common hypothesis is the monomial
distribution. Furthermore, we can apply non-parametric distributions. Naive Bayes algorithms are mainly used
in the analysis of feelings, spam filters, recommendation systems, etc. They are fast and easy to implement, but
their biggest drawback is that predictors must be independent. In most real cases, predictors are dependent,
which hampers the performance of the classifier [11].
2. USING R PROGRAMMING
For demonstration purpose, we will make a Niave Bayes classifier here. We will use a data set that contains
information of 400 student’s database of four attributes with score. Their scores in different subjects and their
educational choices (general, academic or vocational). There are other variables indicating their socio-economic
status and their gender. We will make a Naive Bayer classifier here.
> install.packages("naivebayes")
> library(naivebayes)
> library(naivebayes)
> library(dplyr)
> library(ggplot2)
> library(psych)
> library(caret)
> library(e1071)
> mydata <- read.csv("https://stats.idre.ucla.edu/stat/data/binary.csv")
This command used to load data from internet sources with all rows of the data
> head(mydata)
admit gre gpa rank
1 0 380 3.61 3
2 1 660 3.67 3
3 1 800 4.00 1
4 1 640 3.19 4
5 0 520 2.93 4
6 1 760 3.00 2
> data=mydata
> str(data)
'data.frame': 400 obs. of 4 variables:
$ admit: int 0 1 1 1 0 1 1 0 1 0 ...
$ gre : int 380 660 800 640 520 760 560 400 5
INTERNATIONAL JOURNAL ON ORANGE TECHNOLOGIES (IJOT) e-ISSN: 2615-814
Volume: 01 Issue: 02 | Nov-Dec 2019 www.researchparks.org p-ISSN: 2615-7071
© 2019, IJOT | Research Parks Publishing (IDEAS Lab) | Page 30
$ gpa : num 3.61 3.67 4 3.19 2.93 3 2.98 3.08 3
$ rank : int 3 3 1 4 4 2 1 2 3 2 ...
> colnames(data)
[1] "admit" "gre" "gpa" "rank"
Those commands display the first head and its attributes and columns name of data sets of binary .csv
> xtabs(~admit+rank,data=data)
rank
admit 1 2 3 4
0 28 97 93 55
1 33 54 28 12
Multiway categorical variables there were 28 students who were rank 1 do not admitted in the program similar
ly there were 33 candidate of same category rank were admitted.
> data$rank=as.factor(data$rank)
> data$admit=as.factor(data$admit)
The integer type of admit and rank variable were changed into numeric categorical variable.
> pairs.panels(data[-1])
The cross-matrix plot demonstrates the correlation coffined between independent variables where gre and gpa
has 0.38 percent association whereas rank has negative association to gpa and gre.
> data%>%
+ ggplot(aes(x=admit,y=gre,fill=admit))+ geom_boxplot()+ ggtitle("Box Plot")
Fig: 2 Box Plot
The categorical variable admit is in x axis and y variable gre box plot demonstrate there were not equal distribu
tion of data in each category.
> data%>%
+ ggplot(aes(x=admit,y=gpa,fill=admit))+ geom_boxplot()+ ggtitle("Box Plot")
INTERNATIONAL JOURNAL ON ORANGE TECHNOLOGIES (IJOT) e-ISSN: 2615-814
Volume: 01 Issue: 02 | Nov-Dec 2019 www.researchparks.org p-ISSN: 2615-7071
© 2019, IJOT | Research Parks Publishing (IDEAS Lab) | Page 31
The categorical variable admits in x axis and gpa in y axis plot demonstrate box plot whose data structure who’s
largely varied in nature.
> data%>%
+ ggplot(aes(x=gre,fill=admit))+ geom_density(alpha=0.8,color='black')+ ggtitle("Density Plot")
The density plot when overlap in between gre score and admit demonstrate slightly overlap.
> data%>%
+ ggplot(aes(x=gpa,fill=admit))+ geom_density(alpha=0.8,color='black')+ ggtitle("Density Plot")
The density plot between gpa and admit demonstrates and overlap between the chosen predictors for each
category of the target.
> set.seed(1234)
> ind=sample(2,nrow(data),replace=T, prob=c(.8,.2))
> train=data[ind==1,]
> test=data[ind==2,]
Naive Bayes classifier for binary data sets were grouped into 80/20 partitioning for the training and validation
test set partition from caret package to make a balanced partitioning randomly from 400 data observation. This
is very important in this context because we are going to calculate the prior probabilities from the count of the
training data. So, it should follow the proportion of the parent dataset.
INTERNATIONAL JOURNAL ON ORANGE TECHNOLOGIES (IJOT) e-ISSN: 2615-814
Volume: 01 Issue: 02 | Nov-Dec 2019 www.researchparks.org p-ISSN: 2615-7071
© 2019, IJOT | Research Parks Publishing (IDEAS Lab) | Page 32
The naïve bayers classification will be created with admit categorical variable with all other variables stored in
model.
> model=naiveBayes(admit~.,data=train )
> model
Naive Bayes Classifier for Discrete Predictors
Call:
naiveBayes.default(x = X, y = Y, laplace = laplace)
A-priori probabilities:
Y
0 1
0.6861538 0.3138462
Conditional probabilities:
gre
Y [,1] [,2]
0 578.6547 116.325
1 622.9412 110.924
gpa
Y [,1] [,2]
0 3.355247 0.3714542
1 3.533627 0.3457057
rank
Y 1 2 3 4
0 0.10313901 0.36771300 0.33183857 0.19730
1 0.24509804 0.42156863 0.24509804 0.08823
The model stored 69 percent of not admitted and 31 percent admitted, the gre, gpa and rank were further calcul
ated with its respective percentage.
> summary(model)
Length Class Mode
apriori 2 table numeric
tables 3 -none- list
levels 2 -none- character
isnumeric 3 -none- logical
call 4 -none- call
> # normal distribution when independent variable is categorical p(rank1/admit=0)=0.103 and p(rank1/admit
=1)=0.245 from the statistic table were taken.
> p=predict(model,test)
The naiveBayes function assumed gaussian distributions for numeric variables. Also, the priori is calculated from
the proportion of the training data. The priories are shown when the object is printed. The Y values are the means
and standard deviations of the predictors within each class. The good thing about this package that we can use
Kernel based density for the continuous predictors.
> confusionMatrix(table(p,test$admit))
Confusion Matrix and Statistics
p 0 1
0 47 21
1 3 4
Accuracy : 0.68
95% CI : (0.5622, 0.7831)
No Information Rate : 0.6667
P-Value [Acc > NIR] : 0.4566734
Kappa : 0.122
Mcnemar's Test P-Value : 0.0005202
Sensitivity : 0.9400
INTERNATIONAL JOURNAL ON ORANGE TECHNOLOGIES (IJOT) e-ISSN: 2615-814
Volume: 01 Issue: 02 | Nov-Dec 2019 www.researchparks.org p-ISSN: 2615-7071
© 2019, IJOT | Research Parks Publishing (IDEAS Lab) | Page 33
Specificity : 0.1600
Pos Pred Value : 0.6912
Neg Pred Value : 0.5714
Prevalence : 0.6667
Detection Rate : 0.6267
Detection Prevalence : 0.9067
Balanced Accuracy : 0.5500
'Positive' Class : 0
The prediction with test data is calculated and confusion matrix were calculated and found 68 percentage accur
acy result when confidence interval of 95 percentage. Whose p value is less than .05 implied that there is signifi
cant rejection the hypothesis. Similarly, the sensitivity is higher than specificity demonstrate data has good eno
ugh in model test is not that bad, given that we used just two variables.
> train%>%
+ filter(admit=="0")%>%
+ summarise(mean(gre),sd(gre))
mean(gre) sd(gre)
578.6547 116.325
The train model is filtered with admit value is zero no result and mean and standard deviation were calculated.
> train%>%
+ filter(admit=="1")%>%
+ summarise(mean(gre),sd(gre))
mean(gre) sd(gre)
622.9412 110.924
Similarly, train model is filtered with admit value is zero no result and mean and standard deviation were calcu
lated.
> model2=naive_bayes(admit~.,data=train, usekernel = T)
Another way of calculating naïve bayers classification demonstrate that it raised the training accuracy a little hi
gher, with no change in the test set
> plot(model2)
This plot demonstrates that the green color graphs were admitted and red color were not admitted in each rank
category.
> p1=predict(model,train)
> head(cbind(p1,train))
p1 admit gre gpa rank
1 0 0 380 3.61 3
2 0 1 660 3.67 3
3 1 1 800 4.00 1
4 0 1 640 3.19 4
6 0 1 760 3.00 2
7 0 1 560 2.98 1
The prediction and cbind combines model and prediction variables.
> (tab1=table(p1,train$admit))
INTERNATIONAL JOURNAL ON ORANGE TECHNOLOGIES (IJOT) e-ISSN: 2615-814
Volume: 01 Issue: 02 | Nov-Dec 2019 www.researchparks.org p-ISSN: 2615-7071
© 2019, IJOT | Research Parks Publishing (IDEAS Lab) | Page 34
p1 0 1
0 196 69
1 27 33
This confusion matrix demonstrates the main diagonal elements were valid in machine prediction as well as mo
del prediction whereas second diagonal elements were misclassification needs more correction on data sets.
> 1-sum(diag(tab1))/sum(tab1)
[1] 0.2953846
> p2=predict(model,test)
> (tab2=table(p2,test$admit))
p2 0 1
0 47 21
1 3 4
> 1-sum(diag(tab2))/sum(tab2)
[1] 0.32
There were 29 and 32 percent errors in train and test data sets were easily calculated, therefore naïve probability
model fitting is best when the data sets were in categorical numeric type using machine learning process.
3. CONCLUSION
A Naive Bayes classifier is a probabilistic machine learning model used for the classification activity. The root of
the classifier is based on the Bayes theorem. The Bayesian algorithm provides a probabilistic framework for a
classification problem. It has a simple and solid base for modeling data and is quite robust with outliers and
missing values. However, this algorithm is widely implemented in text mining and document classification where
the application has a large set of attributes and attribute values to calculate. It also serves as a good reference
point for comparison with other models. The implementation of the Bayesian model in production systems is
quite simple and the use of data mining tools is optional. An important limitation of the model is the assumption
of independent attributes, which can be mitigated by advanced modeling or by decreasing the dependency
between attributes by pre-processing.
REFERENCES
[1] Dodge, M. (2008). Understanding Cyberspace Cartographies:. A thesis submitted for the degree of Doctor of Philosophy.
[2] D. Liske, "Lyric Analysis: Predictive Analytics using Machine Learning with R," 2018.
[3] A. Sharma, "How Different are Conventional Programming and Machine Learning?," Tags: Machine Learning,
Programming, 2018.
[4] J. Brownlee, "Supervised and Unsupervised Machine Learning Algorithms," 2016.
[5] M. Sidana, "Types of classification algorithms in Machine Learning," 2017
[6] M. D. B. J. M. D. L. R. C. P. C. A. S. B. B. Troy CS, "Sequence - Evolution - Function: Computational Approaches in
Comparative Genomics.," 2001.
[7] J. P. Jiawei Han, "Learn more about Naïve Bayesian Classifier," 2012.
[8] L. S. R. Arboretti Giancristofaro, MODEL PERFORMANCE ANALYSIS AND MODEL VALIDATION, 2003.
[9] S. Arora, "Data Science vs. Big Data vs. Data Analytics," 2015.
[10] B. Deshpande, "2 main differences between classification and regression trees," 2011.
[11] R. Khan, "Naive Bayes Classifier: theory and R example," 2017.
[12] M. D. S. D. K. Malik Yousef, Feature Selection Has a Large Impact on One-Class Classification Accuracy for MicroRNAs in
Plants, 2016.

More Related Content

What's hot

CS 402 DATAMINING AND WAREHOUSING -MODULE 5
CS 402 DATAMINING AND WAREHOUSING -MODULE 5CS 402 DATAMINING AND WAREHOUSING -MODULE 5
CS 402 DATAMINING AND WAREHOUSING -MODULE 5
NIMMYRAJU
 
An introduction to bayesian statistics
An introduction to bayesian statisticsAn introduction to bayesian statistics
An introduction to bayesian statistics
John Tyndall
 
Hamilton 1994 time series analysis
Hamilton 1994 time series analysisHamilton 1994 time series analysis
Hamilton 1994 time series analysis
Ozan Baskan
 
S&OP Introduction
S&OP IntroductionS&OP Introduction
S&OP Introduction
steelwedge2000
 
Demand Sensing: Finding The Right Balance
Demand Sensing: Finding The Right BalanceDemand Sensing: Finding The Right Balance
Demand Sensing: Finding The Right Balance
Spinnaker Management Group
 
Causal Inference Introduction.pdf
Causal Inference Introduction.pdfCausal Inference Introduction.pdf
Causal Inference Introduction.pdf
Yuna Koyama
 
Developing and validating statistical models for clinical prediction and prog...
Developing and validating statistical models for clinical prediction and prog...Developing and validating statistical models for clinical prediction and prog...
Developing and validating statistical models for clinical prediction and prog...
Evangelos Kritsotakis
 
Topic Modeling
Topic ModelingTopic Modeling
Topic Modeling
Kyunghoon Kim
 
Object Oriented Programming Notes.pdf
Object Oriented Programming Notes.pdfObject Oriented Programming Notes.pdf
Object Oriented Programming Notes.pdf
Priyanka542143
 
Time Series Analysis: Theory and Practice
Time Series Analysis: Theory and PracticeTime Series Analysis: Theory and Practice
Time Series Analysis: Theory and Practice
Tetiana Ivanova
 
Seasonal ARIMA
Seasonal ARIMASeasonal ARIMA
Seasonal ARIMA
Joud Khattab
 
CS 402 DATAMINING AND WAREHOUSING -PROBLEMS
CS 402 DATAMINING AND WAREHOUSING -PROBLEMSCS 402 DATAMINING AND WAREHOUSING -PROBLEMS
CS 402 DATAMINING AND WAREHOUSING -PROBLEMS
NIMMYRAJU
 
Time series-ppts.ppt
Time series-ppts.pptTime series-ppts.ppt
Time series-ppts.ppt
KetanSehdev2
 
Time series modelling arima-arch
Time series modelling  arima-archTime series modelling  arima-arch
Time series modelling arima-arch
jeevan solaskar
 
Change Point Analysis
Change Point AnalysisChange Point Analysis
Change Point Analysis
Taha Kass-Hout, MD, MS
 
Cours les indices de prix de la théorie à la pratique acp version publique 2013
Cours les indices de prix de la théorie à la pratique acp version publique 2013Cours les indices de prix de la théorie à la pratique acp version publique 2013
Cours les indices de prix de la théorie à la pratique acp version publique 2013Axelle Chauvet-Peyrard
 
Introduction to Bayesian Methods
Introduction to Bayesian MethodsIntroduction to Bayesian Methods
Introduction to Bayesian Methods
Corey Chivers
 
What is bayesian statistics and how is it different?
What is bayesian statistics and how is it different?What is bayesian statistics and how is it different?
What is bayesian statistics and how is it different?
Wayne Lee
 
Latent Dirichlet Allocation
Latent Dirichlet AllocationLatent Dirichlet Allocation
Latent Dirichlet Allocation
Marco Righini
 

What's hot (20)

CS 402 DATAMINING AND WAREHOUSING -MODULE 5
CS 402 DATAMINING AND WAREHOUSING -MODULE 5CS 402 DATAMINING AND WAREHOUSING -MODULE 5
CS 402 DATAMINING AND WAREHOUSING -MODULE 5
 
An introduction to bayesian statistics
An introduction to bayesian statisticsAn introduction to bayesian statistics
An introduction to bayesian statistics
 
Hamilton 1994 time series analysis
Hamilton 1994 time series analysisHamilton 1994 time series analysis
Hamilton 1994 time series analysis
 
S&OP Introduction
S&OP IntroductionS&OP Introduction
S&OP Introduction
 
Demand Sensing: Finding The Right Balance
Demand Sensing: Finding The Right BalanceDemand Sensing: Finding The Right Balance
Demand Sensing: Finding The Right Balance
 
Causal Inference Introduction.pdf
Causal Inference Introduction.pdfCausal Inference Introduction.pdf
Causal Inference Introduction.pdf
 
Developing and validating statistical models for clinical prediction and prog...
Developing and validating statistical models for clinical prediction and prog...Developing and validating statistical models for clinical prediction and prog...
Developing and validating statistical models for clinical prediction and prog...
 
Topic Modeling
Topic ModelingTopic Modeling
Topic Modeling
 
Object Oriented Programming Notes.pdf
Object Oriented Programming Notes.pdfObject Oriented Programming Notes.pdf
Object Oriented Programming Notes.pdf
 
Time Series Analysis: Theory and Practice
Time Series Analysis: Theory and PracticeTime Series Analysis: Theory and Practice
Time Series Analysis: Theory and Practice
 
Seasonal ARIMA
Seasonal ARIMASeasonal ARIMA
Seasonal ARIMA
 
CS 402 DATAMINING AND WAREHOUSING -PROBLEMS
CS 402 DATAMINING AND WAREHOUSING -PROBLEMSCS 402 DATAMINING AND WAREHOUSING -PROBLEMS
CS 402 DATAMINING AND WAREHOUSING -PROBLEMS
 
Time series-ppts.ppt
Time series-ppts.pptTime series-ppts.ppt
Time series-ppts.ppt
 
Time series modelling arima-arch
Time series modelling  arima-archTime series modelling  arima-arch
Time series modelling arima-arch
 
Change Point Analysis
Change Point AnalysisChange Point Analysis
Change Point Analysis
 
Cours les indices de prix de la théorie à la pratique acp version publique 2013
Cours les indices de prix de la théorie à la pratique acp version publique 2013Cours les indices de prix de la théorie à la pratique acp version publique 2013
Cours les indices de prix de la théorie à la pratique acp version publique 2013
 
Introduction to Bayesian Methods
Introduction to Bayesian MethodsIntroduction to Bayesian Methods
Introduction to Bayesian Methods
 
Topology ppt
Topology pptTopology ppt
Topology ppt
 
What is bayesian statistics and how is it different?
What is bayesian statistics and how is it different?What is bayesian statistics and how is it different?
What is bayesian statistics and how is it different?
 
Latent Dirichlet Allocation
Latent Dirichlet AllocationLatent Dirichlet Allocation
Latent Dirichlet Allocation
 

Similar to Naïve Bayes Machine Learning Classification with R Programming: A case study of binary data sets

Precipitation’s Level Prediction Based on Tree Augmented Naïve Bayes model
Precipitation’s Level Prediction Based on Tree Augmented Naïve Bayes modelPrecipitation’s Level Prediction Based on Tree Augmented Naïve Bayes model
Precipitation’s Level Prediction Based on Tree Augmented Naïve Bayes model
Nooria Sukmaningtyas
 
Naive.pdf
Naive.pdfNaive.pdf
Naive.pdf
MahimMajee
 
Opinion mining framework using proposed RB-bayes model for text classication
Opinion mining framework using proposed RB-bayes model for text classicationOpinion mining framework using proposed RB-bayes model for text classication
Opinion mining framework using proposed RB-bayes model for text classication
IJECEIAES
 
Unit-2.ppt
Unit-2.pptUnit-2.ppt
Unit-2.ppt
AshwaniShukla47
 
Regression, Bayesian Learning and Support vector machine
Regression, Bayesian Learning and Support vector machineRegression, Bayesian Learning and Support vector machine
Regression, Bayesian Learning and Support vector machine
Dr. Radhey Shyam
 
IRJET- Big Data and Bayes Theorem used Analyze the Student’s Performance in E...
IRJET- Big Data and Bayes Theorem used Analyze the Student’s Performance in E...IRJET- Big Data and Bayes Theorem used Analyze the Student’s Performance in E...
IRJET- Big Data and Bayes Theorem used Analyze the Student’s Performance in E...
IRJET Journal
 
IRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms ComparisonIRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms Comparison
IRJET Journal
 
IRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms ComparisonIRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms Comparison
IRJET Journal
 
probabilistic ranking
probabilistic rankingprobabilistic ranking
probabilistic rankingFELIX75
 
Belief Networks & Bayesian Classification
Belief Networks & Bayesian ClassificationBelief Networks & Bayesian Classification
Belief Networks & Bayesian Classification
Adnan Masood
 
Slides distancecovariance
Slides distancecovarianceSlides distancecovariance
Slides distancecovarianceShrey Nishchal
 
Supervised algorithms
Supervised algorithmsSupervised algorithms
Supervised algorithms
Yassine Akhiat
 
On Extension of Weibull Distribution with Bayesian Analysis using S-Plus Soft...
On Extension of Weibull Distribution with Bayesian Analysis using S-Plus Soft...On Extension of Weibull Distribution with Bayesian Analysis using S-Plus Soft...
On Extension of Weibull Distribution with Bayesian Analysis using S-Plus Soft...
Dr. Amarjeet Singh
 
Naive_hehe.pptx
Naive_hehe.pptxNaive_hehe.pptx
Naive_hehe.pptx
MahimMajee
 
IRJET- Finding the Original Writer of an Anonymous Text using Naïve Bayes Cla...
IRJET- Finding the Original Writer of an Anonymous Text using Naïve Bayes Cla...IRJET- Finding the Original Writer of an Anonymous Text using Naïve Bayes Cla...
IRJET- Finding the Original Writer of an Anonymous Text using Naïve Bayes Cla...
IRJET Journal
 
Data classification sammer
Data classification sammer Data classification sammer
Data classification sammer
Sammer Qader
 
Navies bayes
Navies bayesNavies bayes
Navies bayes
HassanRaza323
 
Dd31720725
Dd31720725Dd31720725
Dd31720725
IJERA Editor
 

Similar to Naïve Bayes Machine Learning Classification with R Programming: A case study of binary data sets (20)

Precipitation’s Level Prediction Based on Tree Augmented Naïve Bayes model
Precipitation’s Level Prediction Based on Tree Augmented Naïve Bayes modelPrecipitation’s Level Prediction Based on Tree Augmented Naïve Bayes model
Precipitation’s Level Prediction Based on Tree Augmented Naïve Bayes model
 
Naive.pdf
Naive.pdfNaive.pdf
Naive.pdf
 
Opinion mining framework using proposed RB-bayes model for text classication
Opinion mining framework using proposed RB-bayes model for text classicationOpinion mining framework using proposed RB-bayes model for text classication
Opinion mining framework using proposed RB-bayes model for text classication
 
Unit-2.ppt
Unit-2.pptUnit-2.ppt
Unit-2.ppt
 
Regression, Bayesian Learning and Support vector machine
Regression, Bayesian Learning and Support vector machineRegression, Bayesian Learning and Support vector machine
Regression, Bayesian Learning and Support vector machine
 
IRJET- Big Data and Bayes Theorem used Analyze the Student’s Performance in E...
IRJET- Big Data and Bayes Theorem used Analyze the Student’s Performance in E...IRJET- Big Data and Bayes Theorem used Analyze the Student’s Performance in E...
IRJET- Big Data and Bayes Theorem used Analyze the Student’s Performance in E...
 
call for papers, research paper publishing, where to publish research paper, ...
call for papers, research paper publishing, where to publish research paper, ...call for papers, research paper publishing, where to publish research paper, ...
call for papers, research paper publishing, where to publish research paper, ...
 
IRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms ComparisonIRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms Comparison
 
IRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms ComparisonIRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms Comparison
 
probabilistic ranking
probabilistic rankingprobabilistic ranking
probabilistic ranking
 
Belief Networks & Bayesian Classification
Belief Networks & Bayesian ClassificationBelief Networks & Bayesian Classification
Belief Networks & Bayesian Classification
 
Slides distancecovariance
Slides distancecovarianceSlides distancecovariance
Slides distancecovariance
 
Supervised algorithms
Supervised algorithmsSupervised algorithms
Supervised algorithms
 
Naive bayes classifier
Naive bayes classifierNaive bayes classifier
Naive bayes classifier
 
On Extension of Weibull Distribution with Bayesian Analysis using S-Plus Soft...
On Extension of Weibull Distribution with Bayesian Analysis using S-Plus Soft...On Extension of Weibull Distribution with Bayesian Analysis using S-Plus Soft...
On Extension of Weibull Distribution with Bayesian Analysis using S-Plus Soft...
 
Naive_hehe.pptx
Naive_hehe.pptxNaive_hehe.pptx
Naive_hehe.pptx
 
IRJET- Finding the Original Writer of an Anonymous Text using Naïve Bayes Cla...
IRJET- Finding the Original Writer of an Anonymous Text using Naïve Bayes Cla...IRJET- Finding the Original Writer of an Anonymous Text using Naïve Bayes Cla...
IRJET- Finding the Original Writer of an Anonymous Text using Naïve Bayes Cla...
 
Data classification sammer
Data classification sammer Data classification sammer
Data classification sammer
 
Navies bayes
Navies bayesNavies bayes
Navies bayes
 
Dd31720725
Dd31720725Dd31720725
Dd31720725
 

More from SubmissionResearchpa

Harmony between society and personality, and its influence on the phenomenon ...
Harmony between society and personality, and its influence on the phenomenon ...Harmony between society and personality, and its influence on the phenomenon ...
Harmony between society and personality, and its influence on the phenomenon ...
SubmissionResearchpa
 
Establishment of the Institute and Waqf aspects of its development in Central...
Establishment of the Institute and Waqf aspects of its development in Central...Establishment of the Institute and Waqf aspects of its development in Central...
Establishment of the Institute and Waqf aspects of its development in Central...
SubmissionResearchpa
 
The history of the Kokand Khanate in the press of Turkestan (According to Sut...
The history of the Kokand Khanate in the press of Turkestan (According to Sut...The history of the Kokand Khanate in the press of Turkestan (According to Sut...
The history of the Kokand Khanate in the press of Turkestan (According to Sut...
SubmissionResearchpa
 
Of partial defects of the dental rows of dynamic study of the state of the mu...
Of partial defects of the dental rows of dynamic study of the state of the mu...Of partial defects of the dental rows of dynamic study of the state of the mu...
Of partial defects of the dental rows of dynamic study of the state of the mu...
SubmissionResearchpa
 
The essence and specifics of modern social and cultural activities in Karakal...
The essence and specifics of modern social and cultural activities in Karakal...The essence and specifics of modern social and cultural activities in Karakal...
The essence and specifics of modern social and cultural activities in Karakal...
SubmissionResearchpa
 
International commercial arbitration in Uzbekistan: current state and develop...
International commercial arbitration in Uzbekistan: current state and develop...International commercial arbitration in Uzbekistan: current state and develop...
International commercial arbitration in Uzbekistan: current state and develop...
SubmissionResearchpa
 
Obtaining higher fatty alcohols based on low molecular polyethylene and their...
Obtaining higher fatty alcohols based on low molecular polyethylene and their...Obtaining higher fatty alcohols based on low molecular polyethylene and their...
Obtaining higher fatty alcohols based on low molecular polyethylene and their...
SubmissionResearchpa
 
Re-positioning adult education for development to thrive in Nigeria
Re-positioning adult education for development to thrive in NigeriaRe-positioning adult education for development to thrive in Nigeria
Re-positioning adult education for development to thrive in Nigeria
SubmissionResearchpa
 
Re-thinking adult basic education in the 21st century
Re-thinking adult basic education in the 21st centuryRe-thinking adult basic education in the 21st century
Re-thinking adult basic education in the 21st century
SubmissionResearchpa
 
Uyghur folk singing genre
Uyghur folk singing genreUyghur folk singing genre
Uyghur folk singing genre
SubmissionResearchpa
 
Infantile cerebral palsy and dental anomalies
Infantile cerebral palsy and dental anomaliesInfantile cerebral palsy and dental anomalies
Infantile cerebral palsy and dental anomalies
SubmissionResearchpa
 
An innovative mechanisms to increase the effectiveness of independent educati...
An innovative mechanisms to increase the effectiveness of independent educati...An innovative mechanisms to increase the effectiveness of independent educati...
An innovative mechanisms to increase the effectiveness of independent educati...
SubmissionResearchpa
 
The role of radiation diagnostic methods in pathological changes of the hip j...
The role of radiation diagnostic methods in pathological changes of the hip j...The role of radiation diagnostic methods in pathological changes of the hip j...
The role of radiation diagnostic methods in pathological changes of the hip j...
SubmissionResearchpa
 
Idealistic study of proverbs
Idealistic study of proverbsIdealistic study of proverbs
Idealistic study of proverbs
SubmissionResearchpa
 
Positive and negative features of mythological images in the epics “Beowulf” ...
Positive and negative features of mythological images in the epics “Beowulf” ...Positive and negative features of mythological images in the epics “Beowulf” ...
Positive and negative features of mythological images in the epics “Beowulf” ...
SubmissionResearchpa
 
Special three processes of production and implementation
Special three processes of production and implementationSpecial three processes of production and implementation
Special three processes of production and implementation
SubmissionResearchpa
 
Improving traditional methods of teaching chemistry
Improving traditional methods of teaching chemistryImproving traditional methods of teaching chemistry
Improving traditional methods of teaching chemistry
SubmissionResearchpa
 
Expression of spiritual experiences in art
Expression of spiritual experiences in artExpression of spiritual experiences in art
Expression of spiritual experiences in art
SubmissionResearchpa
 
Natural emergencies
Natural emergenciesNatural emergencies
Natural emergencies
SubmissionResearchpa
 
Due to intolerance of dental materials used for therapeutic treatment
Due to intolerance of dental materials used for therapeutic treatmentDue to intolerance of dental materials used for therapeutic treatment
Due to intolerance of dental materials used for therapeutic treatment
SubmissionResearchpa
 

More from SubmissionResearchpa (20)

Harmony between society and personality, and its influence on the phenomenon ...
Harmony between society and personality, and its influence on the phenomenon ...Harmony between society and personality, and its influence on the phenomenon ...
Harmony between society and personality, and its influence on the phenomenon ...
 
Establishment of the Institute and Waqf aspects of its development in Central...
Establishment of the Institute and Waqf aspects of its development in Central...Establishment of the Institute and Waqf aspects of its development in Central...
Establishment of the Institute and Waqf aspects of its development in Central...
 
The history of the Kokand Khanate in the press of Turkestan (According to Sut...
The history of the Kokand Khanate in the press of Turkestan (According to Sut...The history of the Kokand Khanate in the press of Turkestan (According to Sut...
The history of the Kokand Khanate in the press of Turkestan (According to Sut...
 
Of partial defects of the dental rows of dynamic study of the state of the mu...
Of partial defects of the dental rows of dynamic study of the state of the mu...Of partial defects of the dental rows of dynamic study of the state of the mu...
Of partial defects of the dental rows of dynamic study of the state of the mu...
 
The essence and specifics of modern social and cultural activities in Karakal...
The essence and specifics of modern social and cultural activities in Karakal...The essence and specifics of modern social and cultural activities in Karakal...
The essence and specifics of modern social and cultural activities in Karakal...
 
International commercial arbitration in Uzbekistan: current state and develop...
International commercial arbitration in Uzbekistan: current state and develop...International commercial arbitration in Uzbekistan: current state and develop...
International commercial arbitration in Uzbekistan: current state and develop...
 
Obtaining higher fatty alcohols based on low molecular polyethylene and their...
Obtaining higher fatty alcohols based on low molecular polyethylene and their...Obtaining higher fatty alcohols based on low molecular polyethylene and their...
Obtaining higher fatty alcohols based on low molecular polyethylene and their...
 
Re-positioning adult education for development to thrive in Nigeria
Re-positioning adult education for development to thrive in NigeriaRe-positioning adult education for development to thrive in Nigeria
Re-positioning adult education for development to thrive in Nigeria
 
Re-thinking adult basic education in the 21st century
Re-thinking adult basic education in the 21st centuryRe-thinking adult basic education in the 21st century
Re-thinking adult basic education in the 21st century
 
Uyghur folk singing genre
Uyghur folk singing genreUyghur folk singing genre
Uyghur folk singing genre
 
Infantile cerebral palsy and dental anomalies
Infantile cerebral palsy and dental anomaliesInfantile cerebral palsy and dental anomalies
Infantile cerebral palsy and dental anomalies
 
An innovative mechanisms to increase the effectiveness of independent educati...
An innovative mechanisms to increase the effectiveness of independent educati...An innovative mechanisms to increase the effectiveness of independent educati...
An innovative mechanisms to increase the effectiveness of independent educati...
 
The role of radiation diagnostic methods in pathological changes of the hip j...
The role of radiation diagnostic methods in pathological changes of the hip j...The role of radiation diagnostic methods in pathological changes of the hip j...
The role of radiation diagnostic methods in pathological changes of the hip j...
 
Idealistic study of proverbs
Idealistic study of proverbsIdealistic study of proverbs
Idealistic study of proverbs
 
Positive and negative features of mythological images in the epics “Beowulf” ...
Positive and negative features of mythological images in the epics “Beowulf” ...Positive and negative features of mythological images in the epics “Beowulf” ...
Positive and negative features of mythological images in the epics “Beowulf” ...
 
Special three processes of production and implementation
Special three processes of production and implementationSpecial three processes of production and implementation
Special three processes of production and implementation
 
Improving traditional methods of teaching chemistry
Improving traditional methods of teaching chemistryImproving traditional methods of teaching chemistry
Improving traditional methods of teaching chemistry
 
Expression of spiritual experiences in art
Expression of spiritual experiences in artExpression of spiritual experiences in art
Expression of spiritual experiences in art
 
Natural emergencies
Natural emergenciesNatural emergencies
Natural emergencies
 
Due to intolerance of dental materials used for therapeutic treatment
Due to intolerance of dental materials used for therapeutic treatmentDue to intolerance of dental materials used for therapeutic treatment
Due to intolerance of dental materials used for therapeutic treatment
 

Recently uploaded

Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
Jean Carlos Nunes Paixão
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
Pavel ( NSTU)
 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
Vivekanand Anglo Vedic Academy
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
Celine George
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
TechSoup
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
Jisc
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
siemaillard
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
beazzy04
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
Sandy Millin
 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
Jheel Barad
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
Anna Sz.
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
TechSoup
 
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
Nguyen Thanh Tu Collection
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
Peter Windle
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
Balvir Singh
 
Francesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptxFrancesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptx
EduSkills OECD
 
Embracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic ImperativeEmbracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic Imperative
Peter Windle
 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
joachimlavalley1
 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
kaushalkr1407
 

Recently uploaded (20)

Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
 
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
 
Francesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptxFrancesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptx
 
Embracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic ImperativeEmbracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic Imperative
 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
 

Naïve Bayes Machine Learning Classification with R Programming: A case study of binary data sets

  • 1. INTERNATIONAL JOURNAL ON ORANGE TECHNOLOGIES (IJOT) e-ISSN: 2615-814 Volume: 01 Issue: 02 | Nov-Dec 2019 www.researchparks.org p-ISSN: 2615-7071 © 2019, IJOT | Research Parks Publishing (IDEAS Lab) | Page 27 Naïve Bayes Machine Learning Classification with R Programming: A case study of binary data sets Yagyanath Rimal1 1School of Engineering, Pokhara University, Nepal E-mail: rimal.yagya@pu.edu.np ---------------------------------------------------------------------***--------------------------------------------------------------------- Abstract - This analytical review paper clearly explains Naïve Bayes machine learning techniques for simple probabilistic classification based on bayes theorem with the assumption of independence between the characteristics using r programming. Although there is large gap between which algorithm is suitable for data analysis when there was large categorical variable to be predict the value in research data. The model is trained in the training data set to make predictions on the test data sets for the implementation of the Naïve Bayes classification. The uniqueness of the technique is that gets new information and tries to make a better forecast by considering the new evidence when the input variable is of largely categorical in nature that is quite similar to how our human mind works while selecting proper judgement from various alternative of choices and can be applied in the neuronal network of the human brain does using r programming. Here researcher takes binary.csv data sets of 400 observations of 4 dependent attributes of educational data sets. Admit is dependent variable of gre, score gpa and rank of previous grade which ultimately determine whether student will be admitted or not for next program. Initially the gra and gpa variables has 0.36 percent significant in the association with rank categorical variable. The box plot and density plot demonstrate the data overlap between admitted and not admitted data sets. The naïve Bayes classification model classify the large data with 0.68 percent for not admitted where as 0.31 percent were admitted. The confusion matrix, and the prediction were calculated with 0.68 percent accuracy when 95 percent confidence interval. Similarly, the training accuracy is increased from 29 percent to 32 percent when naïve Bayes algorithm method as use kernel is equal to TRUE that ultimately decrease misclassification errors in the binary data sets. Key Words: Naive Bayes Classifier, Supervised Learning. 1. INTRODUCTION Machine learning is the study of algorithms that can learn from input data and therefore predict data for future analysis. They use a complex model and an algorithm they learn by themselves to predict, which in commercial use is known as predictive analysis [1]. Based on the statistical data of the statisticians in the form of variables, the data were processed through traditional programming [2]. Google, Facebook and other social networks used to a large extent the classification algorithm of automatic design pages. Although there were two types of supervised learning and an unsupervised algorithm [3]. The supervised learning mainly focuses on the classification of the set of identification categories that support the set of training data that contain observations that, at a certain point, classify the variables of the research data into different classes. It is a classification technique based on Bayes' theorem with a hypothesis of independence among predictors [4]. In simple terms, a Naive Bayes classifier assumes that the presence of a particular attribute in a class is not related to the presence of any other implicit characteristic that the evidence is based on evidence [5]. The naive Bayes classifier has its fundamental pillar from the concept of Bayes' theorem explained by probability theory.
  • 2. INTERNATIONAL JOURNAL ON ORANGE TECHNOLOGIES (IJOT) e-ISSN: 2615-814 Volume: 01 Issue: 02 | Nov-Dec 2019 www.researchparks.org p-ISSN: 2615-7071 © 2019, IJOT | Research Parks Publishing (IDEAS Lab) | Page 28 The probability is the possibility that an event of the various cases occurs. The probability can be related to our normal life and helps us solve many real problems. There is a probability that a single event is calculated as the proportion of cases in which that particular event occurs. Similarly, we have a probability of a group of events, calculated as the proportion of cases in which the group of events occurs together [6]. In the same way, the individual probability was also calculated, so all the yes and the probabilities were calculated first, then all the yes forecasts and not multiplied, so these records were divided with a normalized set of yes data and a total of n. Finally, consider the probability that yes and no, if yes, it is greater than no case, then yes, otherwise there will be no forecasting. For the simplest example, the football players' dataset can play in different seasons like windy, moderate, summer, winter, summer, monsoon, like many categories where there were many conditions of probability of game habits that can having many conditions, there could also be a combination of several cases When calculating the probability table, the entire frequency was calculated in each case and predicted its probability [3]. Therefore, the probability of occurring in the summer as P (c | x) = P (Yes) | Summer = P (Summer | Yes) * P (Yes) /P (Summer) = 0.33 * 0.64 / 0.36 = 0.62.i.e (2/9) * (6/9) *(9/14) / (5/14) * (7/14) * (8/14) = 0.622 which is always compared with the threshold probability test of 0.5 for the forecast. Therefore, the naive Bayes classification tries to predict the situation in many alternative cases as the wind itself, but not the sunny cases therefore: P (C | X) = P (x | c) * P (c) / P ( x) the product of probability and the previous probability of the class divided by the previous probability of the predictor.Here, P (c) is a previous probability that describes the degree to which we believe that the accuracy of the model describes reality based on all our previous information. Similarly, p (x) is a normalization constant that causes the subsequent density to be integrated into one. The posterior probability P (C | X) which represents the degree to which we believe that a given model accurately describes the situation, given the available data and all our previous information. Similarly, P (x | c) is the probability that describes how well the model predicts data [7]. Machine learning is a category of algorithms that allows software applications to be more precise in predicting results without being explicitly programmed. The basic premise of machine learning is to create algorithms that can receive input data and use statistical analysis to predict an output, while outputs are updated as soon as new data become available. The Naive Bayes classification is an important tool related to the analysis of large data or work in the field of data science [8]. R is a free software environment for statistical and graphical processing and is widely used by the academic and industrial world. The Naive Bayes classifier is a simple classifier based on the famous Bayes theorem. Despite its simplicity, it has remained a popular choice for text classification. The Naive Bayes classifier applies the well- known Bayes theorem for conditional probability. In the simplest form for the event AA and BB, the Bayes theorem relates two conditional probabilities in the following way: P (A∩B) = P (A, B) = P (A) P (B | A) = P (B) P (A | B) P (A∩B) = P (A, B) = P (A) P (B | A) = P (B) P (A | B)⟹P (B | A) = P (B) P (A | B) P (A) ⟹P (B | A) = P (B) P (A | B) P (A) play Likelihood play Yes No Yes No ∑/Tot P(x|c) Summer yes 3/9=0.33 Summer 3 2 Season Summer 3/9 2/5 5/14 P(x) Summer 5/14=0.36 Monsoon 4 0 Monsoon 4/9 0/5 4/14 Winter 2 3 Winter 2/9 3/5 5/14 9/14 P(c ) yes 9/14=0.64
  • 3. INTERNATIONAL JOURNAL ON ORANGE TECHNOLOGIES (IJOT) e-ISSN: 2615-814 Volume: 01 Issue: 02 | Nov-Dec 2019 www.researchparks.org p-ISSN: 2615-7071 © 2019, IJOT | Research Parks Publishing (IDEAS Lab) | Page 29 In a classification problem, we have some predictors (characteristics / covariates / independent variables) and a result (target / class / dependent variable) [9]. Each observation has some values for the class of predictors. From these predictors and associated classes, we want to learn so that, if feature values are provided, we can predict the class. Now in Naive Bayes, the algorithm evaluates a probability for each class, when the predictor values are automatically assigned, we can go to the class, which is more likely. The previous expression is quite long and also including several conditional probabilities. But with the assumption of conditional independence, this long expression can be reduced to a very simple form [10]. The assumption of conditional independence is that, given a class, the predictor / characteristic values are independent of each other. There is no correlation between the characteristics of a certain class [11] at any time. Mathematically, if the event AA and BB are independent conditioned by the CC event, then the following is true: P (A | B, C) = P (A | C) P (A | B, C) = P (A | C). Now we are trying to get the probability for certain functionality values. The denominator above is a constant term for certain characteristics. Therefore, it will be sufficient to consider only the part of the numerator to compare the probabilities of different classes conditioned in values of fixed characteristics. That is, we evaluate the part of the numerator for every possible union value and in the end, we vote for the one with the highest value. we need the conditional probability distributions, f (Xj | Ck) f (Xj | Ck) for j = 1,2, ..., nj = 1,2, ..., n. If XjXj is continuous, normality is a common hypothesis. If XkXk is discrete, a common hypothesis is the monomial distribution. Furthermore, we can apply non-parametric distributions. Naive Bayes algorithms are mainly used in the analysis of feelings, spam filters, recommendation systems, etc. They are fast and easy to implement, but their biggest drawback is that predictors must be independent. In most real cases, predictors are dependent, which hampers the performance of the classifier [11]. 2. USING R PROGRAMMING For demonstration purpose, we will make a Niave Bayes classifier here. We will use a data set that contains information of 400 student’s database of four attributes with score. Their scores in different subjects and their educational choices (general, academic or vocational). There are other variables indicating their socio-economic status and their gender. We will make a Naive Bayer classifier here. > install.packages("naivebayes") > library(naivebayes) > library(naivebayes) > library(dplyr) > library(ggplot2) > library(psych) > library(caret) > library(e1071) > mydata <- read.csv("https://stats.idre.ucla.edu/stat/data/binary.csv") This command used to load data from internet sources with all rows of the data > head(mydata) admit gre gpa rank 1 0 380 3.61 3 2 1 660 3.67 3 3 1 800 4.00 1 4 1 640 3.19 4 5 0 520 2.93 4 6 1 760 3.00 2 > data=mydata > str(data) 'data.frame': 400 obs. of 4 variables: $ admit: int 0 1 1 1 0 1 1 0 1 0 ... $ gre : int 380 660 800 640 520 760 560 400 5
  • 4. INTERNATIONAL JOURNAL ON ORANGE TECHNOLOGIES (IJOT) e-ISSN: 2615-814 Volume: 01 Issue: 02 | Nov-Dec 2019 www.researchparks.org p-ISSN: 2615-7071 © 2019, IJOT | Research Parks Publishing (IDEAS Lab) | Page 30 $ gpa : num 3.61 3.67 4 3.19 2.93 3 2.98 3.08 3 $ rank : int 3 3 1 4 4 2 1 2 3 2 ... > colnames(data) [1] "admit" "gre" "gpa" "rank" Those commands display the first head and its attributes and columns name of data sets of binary .csv > xtabs(~admit+rank,data=data) rank admit 1 2 3 4 0 28 97 93 55 1 33 54 28 12 Multiway categorical variables there were 28 students who were rank 1 do not admitted in the program similar ly there were 33 candidate of same category rank were admitted. > data$rank=as.factor(data$rank) > data$admit=as.factor(data$admit) The integer type of admit and rank variable were changed into numeric categorical variable. > pairs.panels(data[-1]) The cross-matrix plot demonstrates the correlation coffined between independent variables where gre and gpa has 0.38 percent association whereas rank has negative association to gpa and gre. > data%>% + ggplot(aes(x=admit,y=gre,fill=admit))+ geom_boxplot()+ ggtitle("Box Plot") Fig: 2 Box Plot The categorical variable admit is in x axis and y variable gre box plot demonstrate there were not equal distribu tion of data in each category. > data%>% + ggplot(aes(x=admit,y=gpa,fill=admit))+ geom_boxplot()+ ggtitle("Box Plot")
  • 5. INTERNATIONAL JOURNAL ON ORANGE TECHNOLOGIES (IJOT) e-ISSN: 2615-814 Volume: 01 Issue: 02 | Nov-Dec 2019 www.researchparks.org p-ISSN: 2615-7071 © 2019, IJOT | Research Parks Publishing (IDEAS Lab) | Page 31 The categorical variable admits in x axis and gpa in y axis plot demonstrate box plot whose data structure who’s largely varied in nature. > data%>% + ggplot(aes(x=gre,fill=admit))+ geom_density(alpha=0.8,color='black')+ ggtitle("Density Plot") The density plot when overlap in between gre score and admit demonstrate slightly overlap. > data%>% + ggplot(aes(x=gpa,fill=admit))+ geom_density(alpha=0.8,color='black')+ ggtitle("Density Plot") The density plot between gpa and admit demonstrates and overlap between the chosen predictors for each category of the target. > set.seed(1234) > ind=sample(2,nrow(data),replace=T, prob=c(.8,.2)) > train=data[ind==1,] > test=data[ind==2,] Naive Bayes classifier for binary data sets were grouped into 80/20 partitioning for the training and validation test set partition from caret package to make a balanced partitioning randomly from 400 data observation. This is very important in this context because we are going to calculate the prior probabilities from the count of the training data. So, it should follow the proportion of the parent dataset.
  • 6. INTERNATIONAL JOURNAL ON ORANGE TECHNOLOGIES (IJOT) e-ISSN: 2615-814 Volume: 01 Issue: 02 | Nov-Dec 2019 www.researchparks.org p-ISSN: 2615-7071 © 2019, IJOT | Research Parks Publishing (IDEAS Lab) | Page 32 The naïve bayers classification will be created with admit categorical variable with all other variables stored in model. > model=naiveBayes(admit~.,data=train ) > model Naive Bayes Classifier for Discrete Predictors Call: naiveBayes.default(x = X, y = Y, laplace = laplace) A-priori probabilities: Y 0 1 0.6861538 0.3138462 Conditional probabilities: gre Y [,1] [,2] 0 578.6547 116.325 1 622.9412 110.924 gpa Y [,1] [,2] 0 3.355247 0.3714542 1 3.533627 0.3457057 rank Y 1 2 3 4 0 0.10313901 0.36771300 0.33183857 0.19730 1 0.24509804 0.42156863 0.24509804 0.08823 The model stored 69 percent of not admitted and 31 percent admitted, the gre, gpa and rank were further calcul ated with its respective percentage. > summary(model) Length Class Mode apriori 2 table numeric tables 3 -none- list levels 2 -none- character isnumeric 3 -none- logical call 4 -none- call > # normal distribution when independent variable is categorical p(rank1/admit=0)=0.103 and p(rank1/admit =1)=0.245 from the statistic table were taken. > p=predict(model,test) The naiveBayes function assumed gaussian distributions for numeric variables. Also, the priori is calculated from the proportion of the training data. The priories are shown when the object is printed. The Y values are the means and standard deviations of the predictors within each class. The good thing about this package that we can use Kernel based density for the continuous predictors. > confusionMatrix(table(p,test$admit)) Confusion Matrix and Statistics p 0 1 0 47 21 1 3 4 Accuracy : 0.68 95% CI : (0.5622, 0.7831) No Information Rate : 0.6667 P-Value [Acc > NIR] : 0.4566734 Kappa : 0.122 Mcnemar's Test P-Value : 0.0005202 Sensitivity : 0.9400
  • 7. INTERNATIONAL JOURNAL ON ORANGE TECHNOLOGIES (IJOT) e-ISSN: 2615-814 Volume: 01 Issue: 02 | Nov-Dec 2019 www.researchparks.org p-ISSN: 2615-7071 © 2019, IJOT | Research Parks Publishing (IDEAS Lab) | Page 33 Specificity : 0.1600 Pos Pred Value : 0.6912 Neg Pred Value : 0.5714 Prevalence : 0.6667 Detection Rate : 0.6267 Detection Prevalence : 0.9067 Balanced Accuracy : 0.5500 'Positive' Class : 0 The prediction with test data is calculated and confusion matrix were calculated and found 68 percentage accur acy result when confidence interval of 95 percentage. Whose p value is less than .05 implied that there is signifi cant rejection the hypothesis. Similarly, the sensitivity is higher than specificity demonstrate data has good eno ugh in model test is not that bad, given that we used just two variables. > train%>% + filter(admit=="0")%>% + summarise(mean(gre),sd(gre)) mean(gre) sd(gre) 578.6547 116.325 The train model is filtered with admit value is zero no result and mean and standard deviation were calculated. > train%>% + filter(admit=="1")%>% + summarise(mean(gre),sd(gre)) mean(gre) sd(gre) 622.9412 110.924 Similarly, train model is filtered with admit value is zero no result and mean and standard deviation were calcu lated. > model2=naive_bayes(admit~.,data=train, usekernel = T) Another way of calculating naïve bayers classification demonstrate that it raised the training accuracy a little hi gher, with no change in the test set > plot(model2) This plot demonstrates that the green color graphs were admitted and red color were not admitted in each rank category. > p1=predict(model,train) > head(cbind(p1,train)) p1 admit gre gpa rank 1 0 0 380 3.61 3 2 0 1 660 3.67 3 3 1 1 800 4.00 1 4 0 1 640 3.19 4 6 0 1 760 3.00 2 7 0 1 560 2.98 1 The prediction and cbind combines model and prediction variables. > (tab1=table(p1,train$admit))
  • 8. INTERNATIONAL JOURNAL ON ORANGE TECHNOLOGIES (IJOT) e-ISSN: 2615-814 Volume: 01 Issue: 02 | Nov-Dec 2019 www.researchparks.org p-ISSN: 2615-7071 © 2019, IJOT | Research Parks Publishing (IDEAS Lab) | Page 34 p1 0 1 0 196 69 1 27 33 This confusion matrix demonstrates the main diagonal elements were valid in machine prediction as well as mo del prediction whereas second diagonal elements were misclassification needs more correction on data sets. > 1-sum(diag(tab1))/sum(tab1) [1] 0.2953846 > p2=predict(model,test) > (tab2=table(p2,test$admit)) p2 0 1 0 47 21 1 3 4 > 1-sum(diag(tab2))/sum(tab2) [1] 0.32 There were 29 and 32 percent errors in train and test data sets were easily calculated, therefore naïve probability model fitting is best when the data sets were in categorical numeric type using machine learning process. 3. CONCLUSION A Naive Bayes classifier is a probabilistic machine learning model used for the classification activity. The root of the classifier is based on the Bayes theorem. The Bayesian algorithm provides a probabilistic framework for a classification problem. It has a simple and solid base for modeling data and is quite robust with outliers and missing values. However, this algorithm is widely implemented in text mining and document classification where the application has a large set of attributes and attribute values to calculate. It also serves as a good reference point for comparison with other models. The implementation of the Bayesian model in production systems is quite simple and the use of data mining tools is optional. An important limitation of the model is the assumption of independent attributes, which can be mitigated by advanced modeling or by decreasing the dependency between attributes by pre-processing. REFERENCES [1] Dodge, M. (2008). Understanding Cyberspace Cartographies:. A thesis submitted for the degree of Doctor of Philosophy. [2] D. Liske, "Lyric Analysis: Predictive Analytics using Machine Learning with R," 2018. [3] A. Sharma, "How Different are Conventional Programming and Machine Learning?," Tags: Machine Learning, Programming, 2018. [4] J. Brownlee, "Supervised and Unsupervised Machine Learning Algorithms," 2016. [5] M. Sidana, "Types of classification algorithms in Machine Learning," 2017 [6] M. D. B. J. M. D. L. R. C. P. C. A. S. B. B. Troy CS, "Sequence - Evolution - Function: Computational Approaches in Comparative Genomics.," 2001. [7] J. P. Jiawei Han, "Learn more about Naïve Bayesian Classifier," 2012. [8] L. S. R. Arboretti Giancristofaro, MODEL PERFORMANCE ANALYSIS AND MODEL VALIDATION, 2003. [9] S. Arora, "Data Science vs. Big Data vs. Data Analytics," 2015. [10] B. Deshpande, "2 main differences between classification and regression trees," 2011. [11] R. Khan, "Naive Bayes Classifier: theory and R example," 2017. [12] M. D. S. D. K. Malik Yousef, Feature Selection Has a Large Impact on One-Class Classification Accuracy for MicroRNAs in Plants, 2016.