1 | P a g e
ABHIJIT SENGUPTA
MBA(D),IV th Semester,
Roll no: 79
Naïve Bayes Classification Model
A Bayes classifier is a s...
2 | P a g e
In practice, there is interest only in the numerator of that fraction, because the
denominator does not depend...
3 | P a g e
This means that under the above independence assumptions, the conditional
distribution over the class variable...
4 | P a g e
R6 >40 low Yes excellent No
R7 31-40 low Yes excellent Yes
R8 <=30 Medium No Fair no
R9 <=30 low Yes Fair Yes
...
5 | P a g e
P(X/Buys a computer = yes) = P(age=youth /buys_computer = yes) *
P(income=medium/buys_computer = yes) * P(stud...
6 | P a g e
Uses of Naive Bayes classification:
1. Naive Bayes text classification
The Bayesian classification is used as ...
7 | P a g e
Bibliography:
(http://nlp.stanford.edu/IR-book/html/htmledition/naive-bayes-text-
classification-1.html)
(http...
Upcoming SlideShare
Loading in …5
×

Naive bayes classifier

496 views

Published on

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

Naive bayes classifier

  1. 1. 1 | P a g e ABHIJIT SENGUPTA MBA(D),IV th Semester, Roll no: 79 Naïve Bayes Classification Model A Bayes classifier is a simple probabilistic classifier based on applying Bayes' theorem (from Bayesian statistics) with strong (naive) independence assumptions. A more descriptive term for the underlying probability model would be "independent feature model". In simple terms, a naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class is unrelated to the presence (or absence) of any other feature. For example, a fruit may be considered to be an apple if it is red, round, and about 4" in diameter. Even if these features depend on each other or upon the existence of the other features, a naive Bayes classifier considers all of these properties to independently contribute to the probability that this fruit is an apple. Probabilistic model Abstractly, the probability model for a classifier is a conditional model over a dependent class variable with a small number of outcomes or classes, conditional on several feature variables through . The problem is that if the number of features is large or when a feature can take on a large number of values, then basing such a model on probability tables is infeasible. We therefore reformulate the model to make it more tractable. Using Bayes' theorem, this can be written In plain English, using Bayesian Probability terminology, the above equation can be written as
  2. 2. 2 | P a g e In practice, there is interest only in the numerator of that fraction, because the denominator does not depend on and the values of the features are given, so that the denominator is effectively constant. The numerator is equivalent to the joint probability model which can be rewritten as follows, using the chain rule for repeated applications of the definition of conditional probability: Now the "naive" conditional independence assumptions come into play: assume that each feature is conditionally independent of every other feature for given the category . This means that , , , and so on, for . Thus, the joint model can be expressed as
  3. 3. 3 | P a g e This means that under the above independence assumptions, the conditional distribution over the class variable is: where the evidence is a scaling factor dependent only on , that is, a constant if the values of the feature variables are known. The denominator Z is being constant, if we compare the numerators, we shall get which of the outcome or class C is most likely to occur given a set of values of Fi. Let us discuss this with the following example: Example In the following table ,there are four attributes age, income, student & credit rating of 14 tuples.The records r1,r2,….r14 are described by their attributes. Let us consider a tuple X having the following attributes : X = ( age= youth, income = medium, student = yes, credit_rating = fair) Will a person belonging to tuple X buy a computer? Records AGE Income Student Credit rating Buys Computer R1 <=30 high No Fair No R2 <=30 high No excellent No R3 30-40 high No Fair Yes R4 >40 medium No Fair Yes R5 >40 low Yes Fair yes
  4. 4. 4 | P a g e R6 >40 low Yes excellent No R7 31-40 low Yes excellent Yes R8 <=30 Medium No Fair no R9 <=30 low Yes Fair Yes R10 >40 medium Yes Fair Yes R11 <=30 medium Yes excellent Yes R12 30-40 medium No excellent Yes R13 30-40 High Yes Fair Yes R14 >40 medium no excellent no D : Set of tuples Each Tuple is an ‘n’ dimensional attribute vector X : (x1,x2,x3,…. xn) Let there be ‘m’ Classes : C1,C2,C3…Cm Naïve Bayes classifier predicts X belongs to Class Ci iff P (Ci/X) > P(Cj/X) for 1<= j <= m , j <> i. Maximum Posteriori Hypothesis: P(Ci/X) = P(X/Ci) P(Ci) / P(X) Maximize P(X/Ci) P(Ci) as P(X) is constant With many attributes, it is computationally expensive to evaluate P(X/Ci). Naïve Assumption of “class conditional independence” Now, P(X/Ci).P(Ci)= P(Xk/Ci).P(Ci) & P(Xk/Ci) =P(X1/Ci).P(X2/Ci)……..P(Xn/Ci) Theory applied on previous example: P(C1) = P(buys_computer = yes) = 9/14 =0.643 P(C2) = P(buys_computer = no) = 5/14= 0.357 P(age=youth /buys_computer = yes) = 2/9 =0.222 P(age=youth /buys_computer = no) = 3/5 =0.600 P(income=medium /buys_computer = yes) = 4/9 =0.444 P(income=medium /buys_computer = no) = 2/5 =0.400 P(student=yes /buys_computer = yes) = 6/9 =0.667 P(credit rating=fair /buys_computer = no) = 2/5 =0.400
  5. 5. 5 | P a g e P(X/Buys a computer = yes) = P(age=youth /buys_computer = yes) * P(income=medium/buys_computer = yes) * P(student=yes /buys_computer = yes) * P(credit rating=fair/buys_computer = yes) = 0.222 * 0.444 * 0.667 * 0.667 = 0.044 Similliarly,P(X/Buys a computer = No) = 0.600 * 0.400 * 0.200 * 0.400 = 0.019 Now,find class Ci that Maximizes P(X/Ci) * P(Ci) =>P(X/Buys a computer = yes) * P(buys_computer = yes) =.044*.643= 0.028 =>P(X/Buys a computer = No) * P(buys_computer = no) = .019*.357=0.007 Prediction : Buys a computer for Tuple X In spite of their naive design and apparently over-simplified assumptions, naive Bayes classifiers have worked quite well in many complex real-world situations. In 2004, analysis of the Bayesian classification problem has shown that there are some theoretical reasons for the apparently unreasonable efficacy of naive Bayes classifiers. Still, a comprehensive comparison with other classification methods in 2006 showed that Bayes classification is outperformed by more current approaches, such as boosted trees or random forests. An advantage of the naive Bayes classifier is that it requires a small amount of training data to estimate the parameters (means and variances of the variables) necessary for classification. Because independent variables are assumed, only the variances of the variables for each class need to be determined and not the entire covariance matrix. It is particularly suited when the dimensionality of the inputs is high. Parameter estimation for naive Bayes models uses the method of maximum likelihood. In spite over-simplified assumptions, it often performs better in many complex real world situations.
  6. 6. 6 | P a g e Uses of Naive Bayes classification: 1. Naive Bayes text classification The Bayesian classification is used as a probabilistic learning method (Naive Bayes text classification). Naive Bayes classifiers are among the most successful known algorithms for learning to classify text documents. 2. Spam filtering Spam filtering is the best known use of Naive Bayesian text classification. It makes use of a naive Bayes classifier to identify spam e-mail.Bayesian spam filtering has become a popular mechanism to distinguish illegitimate spam email from legitimate email (sometimes called "ham" or "bacn").[4] Many modern mail clients implement Bayesian spam filtering. Users can also install separate email filtering programs.Server-side email filters, such as DSPAM, SpamAssassin, SpamBayes, Bogofilter and ASSP, make use of Bayesian spam filtering techniques, and the functionality is sometimes embedded within mail server software itself. 3. Hybrid Recommender System Using Naive Bayes Classifier and Collaborative Filtering Recommender Systems apply machine learning and data mining techniques for filtering unseen information and can predict whether a user would like a given resource.It is proposed a unique switching hybrid recommendation approach by combining a Naïve Bayes classification approach with the collaborative filtering. Experimental results on two different data sets show that the proposed algorithm is scalable and provide better performance–in terms of accuracy and coverage– than other algorithms while at the same time eliminates some recorded problems with the recommender systems. 4. Online applications This online application has been set up as a simple example of supervised machine learning and affective computing. Using a training set of examples which reflect nice, nasty or neutral sentiments, we're training Ditto to distinguish between them.Simple Emotion Modelling, combines a statistically based classifier with a dynamical model. The Naive Bayes classifier employs single words and word pairs as features. It allocates user utterances into nice, nasty and neutral classes, labelled +1, -1 and 0 respectively. This numerical output drives a simple first-order dynamical system, whose state represents the simulated emotional state of the experiment's personification.
  7. 7. 7 | P a g e Bibliography: (http://nlp.stanford.edu/IR-book/html/htmledition/naive-bayes-text- classification-1.html) (http://en.wikipedia.org) (http://eprints.ecs.soton.ac.uk/18483/) (http://www.convo.co.uk/x02/) http://www.statsoft.com Data Mining Concepts & Techniques,Jiowai Han

×