SlideShare a Scribd company logo
Building a Naive Bayes Classifier
            Eric Wilson
          Search Engineer
           Manta Media
The problem: Undesirable Content
Recommended by 3 people:
Bob Perkins
It is a pleasure to work with Kim! Her work is beautiful and she is
professional, communicative, and friendly.

Fred
She lied and stole my money, STAY AWAY!!!!!

Jane Robinson
Very Quick Turn Around as asked - Synced up Perfectly Great Help!
Possible solutions
●   First approach: manually remove undesired
    content.
●   Attempt to filter based on lists of banned
    words.
●   Use a machine learning algorithm to identify
    undesirable content based on a small set of
    manually classified examples.
Using Naive Bayes isn't too hard!
●   We'll need a bit of probability, including the
    concept of conditional probability.
●   A few natural language processing ideas will
    be necessary.
●   Facility with any modern programming
    language.
●   Persistence with many details.
Probability 101
Suppose we choose a number from the set:
           U = {1,2,3,4,5,6,7,8,9,10}
Let A be the event that the number is even,
and B be the event that the number is prime.


Compute P(A), P(B), P(A|B), and P(B|A),
where P(A|B) is the probability of A given B.
Just count!

  1    9

                             3
           4                      7
                         2
                8
           6        10       5

               A             B


P(A) = 5/10 = 1/2
P(B) = 4/10 = 2/5
P(A|B) = 1/4
P(B|A) = 1/5
Bayes Theorem
P(A|B) = P(AB)/P(B)

P(B)P(A|B) = P(AB)

P(B)P(A|B) = P(A)P(B|A)

P(A|B) = P(A)P(B|A)/P(B)
A simplistic language model
Consider each document to be a set of words,
along with frequencies.
For example: “The premium quality for the
discount price” is viewed as:
{'the':2, 'premium':1, 'quality':1, 'for':1, 
'discount':1, 'price':1}

Same as “The discount quality for the premium
price,” since we don't care about order.
That seems … foolish
●   English is so complicated that we won't have
    any real hope of understanding semantics.
●   In many real-life scenarios, text that you want
    to classify is not exactly subtle.
●   If necessary, we can improve our language
    model later.
An example:
Type       Text                          Class
Training   Good happy good               Positive
Training   Good good service             Positive
Training   Good friendly                 Positive
Training   Lousy good cheat              Negative
Test       Good good good cheat lousy    ??

In order to be able to perform all calculations, we
will use an example with extremely small
documents.
What was the question?
We are trying to determine whether the last
recommendation was positive or negative.
We want to compute:
P(Pos|good good good lousy cheat)
By Bayes Theorem, this is equal to:
P(Pos)P(good good good lousy cheat|Pos)
   P(good good good lousy cheat)
What do we know?
P(Pos) = 3/4
P(good|Pos), P(cheat|Pos), P(lousy|Pos)
Are all easily computed by counting using the
training set.
Which is almost what we want ...
Wouldn't it be nice ...
Maybe we have all we need? Isn't
P(good good good lousy cheat|Pos) =
P(good|Pos)3P(lousy|Pos)P(cheat|Pos) ?


Well, yes, if these are independent events,
which almost certainly doesn't hold.
The “naive” assumption is that we can
consider these events independent.
The Naive Bayes Algorithm
If C1,C2,...,Cn are classes, and an instance has
features F1,F2,...,Fm, then the most likely class
for this instance is the one that maximizes the
following:
        P(Ci )P(F1|Ci )P(F2|Ci )...P(Fm|Ci )
Wasn't there a denominator?
If our goal was to compute the probability of
the most likely class, we should divide by:
               P(F1)P(F2)...P(Fm)
We can ignore this part because, we only care
about which class has the highest probability,
and this term is the same for each class.
Interesting theory but …
Won't this break as soon as we encounter a
word that isn't in our training set?
For example, if “goood” is not in our training
set, and occurs in our test set, then since
P(Pos|goood) = 0, so our product is zero for all
classes.
We need nonzero probabilities for all words,
even words that don't exist.
Plus-one smoothing
Just count every word one time more than it
actually occurs.
Since we are only concerned with relative
probabilities, this inaccuracy should be of no
concern.
P(word|C) = count(word|C) + 1
                count(C) + V
(V is the total vocabulary, so that our
probabilities sum to 1.)
Let's try it out:
 P(Pos) = ¾                         Type    Text                         Class

 P(Neg) = ¼                         Training Good happy good             Positive
                                            Good good service            Positive
                                            Good friendly                Positive
                                            Lousy good cheat             Negative
                                    Test    Good good good cheat lousy   ??
P(good|Pos) = (5+1)/(8+6) = 3/7
P(cheat|Pos) = (0+1)/(8+6) = 1/14          P(Pos|D5) ~ ¾ * (3/7)3*(1/14)*(1/14)
P(lousy|Pos) = (0+1)/(8+6) = 1/14           = 0.0003
P(good|Neg) = (1+1)/(3+6) = 2/9            P(Neg|D5) ~ ¼ * (2/9)3*(2/9)*(2/9)
P(cheat|Neg) = (1+1)/(3+6) = 2/9           = 0.0001
P(lousy|Neg) = (1+1)/(3+6) = 2/9
Training the classifier
●   Count instances of classes, store counts in a map.
●   Store counts of all words in a nested map:
    {'pos':
        {'good': 5, 'friendly': 1, 'service': 1, 'happy': 1},
    'neg':
        {'cheat': 1, 'lousy': 1, 'good': 1}
    }
●   Should be easy to compute probabilities.
●   Should be efficient (training time and memory.)
Some practical problems
●   Tokenization
●   Arithmetic
●   How to evaluate results?
Tokenization
●   Use whitespace?
    –   “food”, “food.”, food,” and “food!” all different.
●   Use whitespace and punctuation?
    –   “won't” tokenized to “won” and “t”
●   What about emails? Urls? Phone numbers?
    What about the things we haven't thought
    about yet?
●   Use a library. Lucene is a good choice.
Arithmetic
What happens when you multiply a large
amount of small numbers?
To prevent underflow, use sums of logs instead
of products of true probabilities.
Key properties of log:
   ●   log(AB) = log(A) + log(B)
   ●   x > y => log(x) > log(y)
   ●   Turns very small numbers into managable negative
       numbers
Evaluating a classifier
●   Precision and recall
●   Confusion matrix
●   Divide training set into nine “folds”, train
    classifier on nine folds, and check accuracy of
    classifying the tenth fold
Experiment
●   Tokenization strategies
    –   Stop words
    –   Capitalization
    –   Stemming
●   Language model
    –   Ignore multiplicities
    –   Smoothing
Contact me
●   wilson.eric.n@gmail.com
●   ewilson@manta.com
●   @wilsonericn
●   http://wilsonericn.wordpress.com

More Related Content

What's hot

Logistic Regression in Python | Logistic Regression Example | Machine Learnin...
Logistic Regression in Python | Logistic Regression Example | Machine Learnin...Logistic Regression in Python | Logistic Regression Example | Machine Learnin...
Logistic Regression in Python | Logistic Regression Example | Machine Learnin...
Edureka!
 
Naive Bayes
Naive BayesNaive Bayes
Naive Bayes
CloudxLab
 
K - Nearest neighbor ( KNN )
K - Nearest neighbor  ( KNN )K - Nearest neighbor  ( KNN )
K - Nearest neighbor ( KNN )
Mohammad Junaid Khan
 
Linear models for classification
Linear models for classificationLinear models for classification
Linear models for classification
Sung Yub Kim
 
K-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierK-Nearest Neighbor Classifier
K-Nearest Neighbor Classifier
Neha Kulkarni
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machinesnextlib
 
Python NumPy Tutorial | NumPy Array | Edureka
Python NumPy Tutorial | NumPy Array | EdurekaPython NumPy Tutorial | NumPy Array | Edureka
Python NumPy Tutorial | NumPy Array | Edureka
Edureka!
 
Bayesian network
Bayesian networkBayesian network
Bayesian network
Ahmad El Tawil
 
Naive Bayes Classifier | Naive Bayes Algorithm | Naive Bayes Classifier With ...
Naive Bayes Classifier | Naive Bayes Algorithm | Naive Bayes Classifier With ...Naive Bayes Classifier | Naive Bayes Algorithm | Naive Bayes Classifier With ...
Naive Bayes Classifier | Naive Bayes Algorithm | Naive Bayes Classifier With ...
Simplilearn
 
Introduction to Neural Networks
Introduction to Neural NetworksIntroduction to Neural Networks
Introduction to Neural Networks
Databricks
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning Algorithms
Md. Main Uddin Rony
 
Deep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and RegularizationDeep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and Regularization
Yan Xu
 
KNN
KNN KNN
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...
Simplilearn
 
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
NAVER Engineering
 
NAIVE BAYES CLASSIFIER
NAIVE BAYES CLASSIFIERNAIVE BAYES CLASSIFIER
NAIVE BAYES CLASSIFIER
Knoldus Inc.
 
NN_02_Threshold_Logic_Units.pdf
NN_02_Threshold_Logic_Units.pdfNN_02_Threshold_Logic_Units.pdf
NN_02_Threshold_Logic_Units.pdf
chiron1988
 
Linear Regression Algorithm | Linear Regression in Python | Machine Learning ...
Linear Regression Algorithm | Linear Regression in Python | Machine Learning ...Linear Regression Algorithm | Linear Regression in Python | Machine Learning ...
Linear Regression Algorithm | Linear Regression in Python | Machine Learning ...
Edureka!
 
Linear regression in machine learning
Linear regression in machine learningLinear regression in machine learning
Linear regression in machine learning
Shajun Nisha
 
Matrix Factorization
Matrix FactorizationMatrix Factorization
Matrix Factorization
Yusuke Yamamoto
 

What's hot (20)

Logistic Regression in Python | Logistic Regression Example | Machine Learnin...
Logistic Regression in Python | Logistic Regression Example | Machine Learnin...Logistic Regression in Python | Logistic Regression Example | Machine Learnin...
Logistic Regression in Python | Logistic Regression Example | Machine Learnin...
 
Naive Bayes
Naive BayesNaive Bayes
Naive Bayes
 
K - Nearest neighbor ( KNN )
K - Nearest neighbor  ( KNN )K - Nearest neighbor  ( KNN )
K - Nearest neighbor ( KNN )
 
Linear models for classification
Linear models for classificationLinear models for classification
Linear models for classification
 
K-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierK-Nearest Neighbor Classifier
K-Nearest Neighbor Classifier
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
 
Python NumPy Tutorial | NumPy Array | Edureka
Python NumPy Tutorial | NumPy Array | EdurekaPython NumPy Tutorial | NumPy Array | Edureka
Python NumPy Tutorial | NumPy Array | Edureka
 
Bayesian network
Bayesian networkBayesian network
Bayesian network
 
Naive Bayes Classifier | Naive Bayes Algorithm | Naive Bayes Classifier With ...
Naive Bayes Classifier | Naive Bayes Algorithm | Naive Bayes Classifier With ...Naive Bayes Classifier | Naive Bayes Algorithm | Naive Bayes Classifier With ...
Naive Bayes Classifier | Naive Bayes Algorithm | Naive Bayes Classifier With ...
 
Introduction to Neural Networks
Introduction to Neural NetworksIntroduction to Neural Networks
Introduction to Neural Networks
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning Algorithms
 
Deep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and RegularizationDeep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and Regularization
 
KNN
KNN KNN
KNN
 
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...
 
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
 
NAIVE BAYES CLASSIFIER
NAIVE BAYES CLASSIFIERNAIVE BAYES CLASSIFIER
NAIVE BAYES CLASSIFIER
 
NN_02_Threshold_Logic_Units.pdf
NN_02_Threshold_Logic_Units.pdfNN_02_Threshold_Logic_Units.pdf
NN_02_Threshold_Logic_Units.pdf
 
Linear Regression Algorithm | Linear Regression in Python | Machine Learning ...
Linear Regression Algorithm | Linear Regression in Python | Machine Learning ...Linear Regression Algorithm | Linear Regression in Python | Machine Learning ...
Linear Regression Algorithm | Linear Regression in Python | Machine Learning ...
 
Linear regression in machine learning
Linear regression in machine learningLinear regression in machine learning
Linear regression in machine learning
 
Matrix Factorization
Matrix FactorizationMatrix Factorization
Matrix Factorization
 

Viewers also liked

k Nearest Neighbor
k Nearest Neighbork Nearest Neighbor
k Nearest Neighborbutest
 
Classification with Naive Bayes
Classification with Naive BayesClassification with Naive Bayes
Classification with Naive Bayes
Josh Patterson
 
06 Machine Learning - Naive Bayes
06 Machine Learning - Naive Bayes06 Machine Learning - Naive Bayes
06 Machine Learning - Naive Bayes
Andres Mendez-Vazquez
 
A Semi-naive Bayes Classifier with Grouping of Cases
A Semi-naive Bayes Classifier with Grouping of CasesA Semi-naive Bayes Classifier with Grouping of Cases
A Semi-naive Bayes Classifier with Grouping of Cases
NTNU
 
Wikipedia, Dead Authors, Naive Bayes and Python
Wikipedia, Dead Authors, Naive Bayes and Python Wikipedia, Dead Authors, Naive Bayes and Python
Wikipedia, Dead Authors, Naive Bayes and Python
Abhaya Agarwal
 
Bayesian Machine Learning & Python – Naïve Bayes (PyData SV 2013)
Bayesian Machine Learning & Python – Naïve Bayes (PyData SV 2013)Bayesian Machine Learning & Python – Naïve Bayes (PyData SV 2013)
Bayesian Machine Learning & Python – Naïve Bayes (PyData SV 2013)
PyData
 
Modified naive bayes model for improved web page classification
Modified naive bayes model for improved web page classificationModified naive bayes model for improved web page classification
Modified naive bayes model for improved web page classification
Hammad Haleem
 
Puppet for dummies - PHPBenelux UG edition
Puppet for dummies - PHPBenelux UG editionPuppet for dummies - PHPBenelux UG edition
Puppet for dummies - PHPBenelux UG editionJoshua Thijssen
 
Workshop unittesting
Workshop unittestingWorkshop unittesting
Workshop unittesting
Joshua Thijssen
 
Moved 301
Moved 301Moved 301
Moved 301
Joshua Thijssen
 
Representation state transfer and some other important stuff
Representation state transfer and some other important stuffRepresentation state transfer and some other important stuff
Representation state transfer and some other important stuff
Joshua Thijssen
 
Deploying and maintaining your software with RPM/APT
Deploying and maintaining your software with RPM/APTDeploying and maintaining your software with RPM/APT
Deploying and maintaining your software with RPM/APTJoshua Thijssen
 
Alice & bob public key cryptography 101 - uncon dpc
Alice & bob  public key cryptography 101 - uncon dpcAlice & bob  public key cryptography 101 - uncon dpc
Alice & bob public key cryptography 101 - uncon dpcJoshua Thijssen
 
15 protips for mysql users
15 protips for mysql users15 protips for mysql users
15 protips for mysql users
Joshua Thijssen
 
Viva
VivaViva
Introduction to Kernel Functions
Introduction to Kernel FunctionsIntroduction to Kernel Functions
Introduction to Kernel Functions
Michel Alves
 
Wsd final paper
Wsd final paperWsd final paper
Wsd final paper
Milind Gokhale
 

Viewers also liked (20)

Naive bayes
Naive bayesNaive bayes
Naive bayes
 
Lecture10 - Naïve Bayes
Lecture10 - Naïve BayesLecture10 - Naïve Bayes
Lecture10 - Naïve Bayes
 
k Nearest Neighbor
k Nearest Neighbork Nearest Neighbor
k Nearest Neighbor
 
Classification with Naive Bayes
Classification with Naive BayesClassification with Naive Bayes
Classification with Naive Bayes
 
06 Machine Learning - Naive Bayes
06 Machine Learning - Naive Bayes06 Machine Learning - Naive Bayes
06 Machine Learning - Naive Bayes
 
A Semi-naive Bayes Classifier with Grouping of Cases
A Semi-naive Bayes Classifier with Grouping of CasesA Semi-naive Bayes Classifier with Grouping of Cases
A Semi-naive Bayes Classifier with Grouping of Cases
 
Wikipedia, Dead Authors, Naive Bayes and Python
Wikipedia, Dead Authors, Naive Bayes and Python Wikipedia, Dead Authors, Naive Bayes and Python
Wikipedia, Dead Authors, Naive Bayes and Python
 
Bayesian Machine Learning & Python – Naïve Bayes (PyData SV 2013)
Bayesian Machine Learning & Python – Naïve Bayes (PyData SV 2013)Bayesian Machine Learning & Python – Naïve Bayes (PyData SV 2013)
Bayesian Machine Learning & Python – Naïve Bayes (PyData SV 2013)
 
Modified naive bayes model for improved web page classification
Modified naive bayes model for improved web page classificationModified naive bayes model for improved web page classification
Modified naive bayes model for improved web page classification
 
Puppet for dummies - PHPBenelux UG edition
Puppet for dummies - PHPBenelux UG editionPuppet for dummies - PHPBenelux UG edition
Puppet for dummies - PHPBenelux UG edition
 
Workshop unittesting
Workshop unittestingWorkshop unittesting
Workshop unittesting
 
Moved 301
Moved 301Moved 301
Moved 301
 
Representation state transfer and some other important stuff
Representation state transfer and some other important stuffRepresentation state transfer and some other important stuff
Representation state transfer and some other important stuff
 
Deploying and maintaining your software with RPM/APT
Deploying and maintaining your software with RPM/APTDeploying and maintaining your software with RPM/APT
Deploying and maintaining your software with RPM/APT
 
Alice & bob public key cryptography 101 - uncon dpc
Alice & bob  public key cryptography 101 - uncon dpcAlice & bob  public key cryptography 101 - uncon dpc
Alice & bob public key cryptography 101 - uncon dpc
 
15 protips for mysql users
15 protips for mysql users15 protips for mysql users
15 protips for mysql users
 
Viva
VivaViva
Viva
 
228-SE3001_2
228-SE3001_2228-SE3001_2
228-SE3001_2
 
Introduction to Kernel Functions
Introduction to Kernel FunctionsIntroduction to Kernel Functions
Introduction to Kernel Functions
 
Wsd final paper
Wsd final paperWsd final paper
Wsd final paper
 

Similar to Naive Bayes

Naïve bayes
Naïve bayesNaïve bayes
Naïve bayes
David Hoen
 
Naïve bayes
Naïve bayesNaïve bayes
Naïve bayes
Fraboni Ec
 
Naïve bayes
Naïve bayesNaïve bayes
Naïve bayes
James Wong
 
Naïve bayes
Naïve bayesNaïve bayes
Naïve bayes
Harry Potter
 
Topic_5_NB_Sentiment_Classification_.pptx
Topic_5_NB_Sentiment_Classification_.pptxTopic_5_NB_Sentiment_Classification_.pptx
Topic_5_NB_Sentiment_Classification_.pptx
HassaanIbrahim2
 
PyOhio Recursion Slides
PyOhio Recursion SlidesPyOhio Recursion Slides
PyOhio Recursion Slides
Rinita Gulliani
 
Applied Math 40S February 28, 2008
Applied Math 40S February 28, 2008Applied Math 40S February 28, 2008
Applied Math 40S February 28, 2008
Darren Kuropatwa
 
lec03-LanguageModels_230214_161016.pdf
lec03-LanguageModels_230214_161016.pdflec03-LanguageModels_230214_161016.pdf
lec03-LanguageModels_230214_161016.pdf
ykyog
 
powerpoints probability.pptx
powerpoints probability.pptxpowerpoints probability.pptx
powerpoints probability.pptx
carrie mixto
 
Bayes' Theorem for Programmers
Bayes' Theorem for ProgrammersBayes' Theorem for Programmers
Bayes' Theorem for ProgrammersMoshe Zadka
 
sample space formation.pdf
sample space formation.pdfsample space formation.pdf
sample space formation.pdf
MuhammadAdeel91907
 
HackYale - Natural Language Processing (Week 1)
HackYale - Natural Language Processing (Week 1)HackYale - Natural Language Processing (Week 1)
HackYale - Natural Language Processing (Week 1)
Nick Hathaway
 
Naive bayesian classification
Naive bayesian classificationNaive bayesian classification
Naive bayesian classification
Dr-Dipali Meher
 
9+&+10+English+_+Class+09+CBSE+2020+_Formula+Cheat+Sheet+_+Number+System+&+Po...
9+&+10+English+_+Class+09+CBSE+2020+_Formula+Cheat+Sheet+_+Number+System+&+Po...9+&+10+English+_+Class+09+CBSE+2020+_Formula+Cheat+Sheet+_+Number+System+&+Po...
9+&+10+English+_+Class+09+CBSE+2020+_Formula+Cheat+Sheet+_+Number+System+&+Po...
ghghghg3
 
Stanford splash spring 2016 basic programming
Stanford splash spring 2016 basic programmingStanford splash spring 2016 basic programming
Stanford splash spring 2016 basic programming
Yu-Sheng (Yosen) Chen
 

Similar to Naive Bayes (20)

Naïve bayes
Naïve bayesNaïve bayes
Naïve bayes
 
Naïve bayes
Naïve bayesNaïve bayes
Naïve bayes
 
Naïve bayes
Naïve bayesNaïve bayes
Naïve bayes
 
Naïve bayes
Naïve bayesNaïve bayes
Naïve bayes
 
Naïve bayes
Naïve bayesNaïve bayes
Naïve bayes
 
Naïve bayes
Naïve bayesNaïve bayes
Naïve bayes
 
Naïve bayes
Naïve bayesNaïve bayes
Naïve bayes
 
Kemahiran kurang satu
Kemahiran kurang satuKemahiran kurang satu
Kemahiran kurang satu
 
Topic_5_NB_Sentiment_Classification_.pptx
Topic_5_NB_Sentiment_Classification_.pptxTopic_5_NB_Sentiment_Classification_.pptx
Topic_5_NB_Sentiment_Classification_.pptx
 
PyOhio Recursion Slides
PyOhio Recursion SlidesPyOhio Recursion Slides
PyOhio Recursion Slides
 
Applied Math 40S February 28, 2008
Applied Math 40S February 28, 2008Applied Math 40S February 28, 2008
Applied Math 40S February 28, 2008
 
lec03-LanguageModels_230214_161016.pdf
lec03-LanguageModels_230214_161016.pdflec03-LanguageModels_230214_161016.pdf
lec03-LanguageModels_230214_161016.pdf
 
powerpoints probability.pptx
powerpoints probability.pptxpowerpoints probability.pptx
powerpoints probability.pptx
 
Bayes' Theorem for Programmers
Bayes' Theorem for ProgrammersBayes' Theorem for Programmers
Bayes' Theorem for Programmers
 
M18 learning
M18 learningM18 learning
M18 learning
 
sample space formation.pdf
sample space formation.pdfsample space formation.pdf
sample space formation.pdf
 
HackYale - Natural Language Processing (Week 1)
HackYale - Natural Language Processing (Week 1)HackYale - Natural Language Processing (Week 1)
HackYale - Natural Language Processing (Week 1)
 
Naive bayesian classification
Naive bayesian classificationNaive bayesian classification
Naive bayesian classification
 
9+&+10+English+_+Class+09+CBSE+2020+_Formula+Cheat+Sheet+_+Number+System+&+Po...
9+&+10+English+_+Class+09+CBSE+2020+_Formula+Cheat+Sheet+_+Number+System+&+Po...9+&+10+English+_+Class+09+CBSE+2020+_Formula+Cheat+Sheet+_+Number+System+&+Po...
9+&+10+English+_+Class+09+CBSE+2020+_Formula+Cheat+Sheet+_+Number+System+&+Po...
 
Stanford splash spring 2016 basic programming
Stanford splash spring 2016 basic programmingStanford splash spring 2016 basic programming
Stanford splash spring 2016 basic programming
 

Naive Bayes

  • 1. Building a Naive Bayes Classifier Eric Wilson Search Engineer Manta Media
  • 2. The problem: Undesirable Content Recommended by 3 people: Bob Perkins It is a pleasure to work with Kim! Her work is beautiful and she is professional, communicative, and friendly. Fred She lied and stole my money, STAY AWAY!!!!! Jane Robinson Very Quick Turn Around as asked - Synced up Perfectly Great Help!
  • 3. Possible solutions ● First approach: manually remove undesired content. ● Attempt to filter based on lists of banned words. ● Use a machine learning algorithm to identify undesirable content based on a small set of manually classified examples.
  • 4. Using Naive Bayes isn't too hard! ● We'll need a bit of probability, including the concept of conditional probability. ● A few natural language processing ideas will be necessary. ● Facility with any modern programming language. ● Persistence with many details.
  • 5. Probability 101 Suppose we choose a number from the set: U = {1,2,3,4,5,6,7,8,9,10} Let A be the event that the number is even, and B be the event that the number is prime. Compute P(A), P(B), P(A|B), and P(B|A), where P(A|B) is the probability of A given B.
  • 6. Just count! 1 9 3 4 7 2 8 6 10 5 A B P(A) = 5/10 = 1/2 P(B) = 4/10 = 2/5 P(A|B) = 1/4 P(B|A) = 1/5
  • 7. Bayes Theorem P(A|B) = P(AB)/P(B) P(B)P(A|B) = P(AB) P(B)P(A|B) = P(A)P(B|A) P(A|B) = P(A)P(B|A)/P(B)
  • 8. A simplistic language model Consider each document to be a set of words, along with frequencies. For example: “The premium quality for the discount price” is viewed as: {'the':2, 'premium':1, 'quality':1, 'for':1,  'discount':1, 'price':1} Same as “The discount quality for the premium price,” since we don't care about order.
  • 9. That seems … foolish ● English is so complicated that we won't have any real hope of understanding semantics. ● In many real-life scenarios, text that you want to classify is not exactly subtle. ● If necessary, we can improve our language model later.
  • 10. An example: Type Text Class Training Good happy good Positive Training Good good service Positive Training Good friendly Positive Training Lousy good cheat Negative Test Good good good cheat lousy ?? In order to be able to perform all calculations, we will use an example with extremely small documents.
  • 11. What was the question? We are trying to determine whether the last recommendation was positive or negative. We want to compute: P(Pos|good good good lousy cheat) By Bayes Theorem, this is equal to: P(Pos)P(good good good lousy cheat|Pos) P(good good good lousy cheat)
  • 12. What do we know? P(Pos) = 3/4 P(good|Pos), P(cheat|Pos), P(lousy|Pos) Are all easily computed by counting using the training set. Which is almost what we want ...
  • 13. Wouldn't it be nice ... Maybe we have all we need? Isn't P(good good good lousy cheat|Pos) = P(good|Pos)3P(lousy|Pos)P(cheat|Pos) ? Well, yes, if these are independent events, which almost certainly doesn't hold. The “naive” assumption is that we can consider these events independent.
  • 14. The Naive Bayes Algorithm If C1,C2,...,Cn are classes, and an instance has features F1,F2,...,Fm, then the most likely class for this instance is the one that maximizes the following: P(Ci )P(F1|Ci )P(F2|Ci )...P(Fm|Ci )
  • 15. Wasn't there a denominator? If our goal was to compute the probability of the most likely class, we should divide by: P(F1)P(F2)...P(Fm) We can ignore this part because, we only care about which class has the highest probability, and this term is the same for each class.
  • 16. Interesting theory but … Won't this break as soon as we encounter a word that isn't in our training set? For example, if “goood” is not in our training set, and occurs in our test set, then since P(Pos|goood) = 0, so our product is zero for all classes. We need nonzero probabilities for all words, even words that don't exist.
  • 17. Plus-one smoothing Just count every word one time more than it actually occurs. Since we are only concerned with relative probabilities, this inaccuracy should be of no concern. P(word|C) = count(word|C) + 1 count(C) + V (V is the total vocabulary, so that our probabilities sum to 1.)
  • 18. Let's try it out: P(Pos) = ¾ Type Text Class P(Neg) = ¼ Training Good happy good Positive Good good service Positive Good friendly Positive Lousy good cheat Negative Test Good good good cheat lousy ?? P(good|Pos) = (5+1)/(8+6) = 3/7 P(cheat|Pos) = (0+1)/(8+6) = 1/14 P(Pos|D5) ~ ¾ * (3/7)3*(1/14)*(1/14) P(lousy|Pos) = (0+1)/(8+6) = 1/14 = 0.0003 P(good|Neg) = (1+1)/(3+6) = 2/9 P(Neg|D5) ~ ¼ * (2/9)3*(2/9)*(2/9) P(cheat|Neg) = (1+1)/(3+6) = 2/9 = 0.0001 P(lousy|Neg) = (1+1)/(3+6) = 2/9
  • 19. Training the classifier ● Count instances of classes, store counts in a map. ● Store counts of all words in a nested map: {'pos': {'good': 5, 'friendly': 1, 'service': 1, 'happy': 1}, 'neg': {'cheat': 1, 'lousy': 1, 'good': 1} } ● Should be easy to compute probabilities. ● Should be efficient (training time and memory.)
  • 20. Some practical problems ● Tokenization ● Arithmetic ● How to evaluate results?
  • 21. Tokenization ● Use whitespace? – “food”, “food.”, food,” and “food!” all different. ● Use whitespace and punctuation? – “won't” tokenized to “won” and “t” ● What about emails? Urls? Phone numbers? What about the things we haven't thought about yet? ● Use a library. Lucene is a good choice.
  • 22. Arithmetic What happens when you multiply a large amount of small numbers? To prevent underflow, use sums of logs instead of products of true probabilities. Key properties of log: ● log(AB) = log(A) + log(B) ● x > y => log(x) > log(y) ● Turns very small numbers into managable negative numbers
  • 23. Evaluating a classifier ● Precision and recall ● Confusion matrix ● Divide training set into nine “folds”, train classifier on nine folds, and check accuracy of classifying the tenth fold
  • 24. Experiment ● Tokenization strategies – Stop words – Capitalization – Stemming ● Language model – Ignore multiplicities – Smoothing
  • 25. Contact me ● wilson.eric.n@gmail.com ● ewilson@manta.com ● @wilsonericn ● http://wilsonericn.wordpress.com