SlideShare a Scribd company logo
Text Classification and Naïve Bayes
An example of text classification
Definition of a machine learning problem
A refresher on probability
The Naive Bayes classifier
1
Google News
2
Different ways for classification
Human labor (people assign categories to every incoming
article)
Hand-crafted rules for automatic classification
 If article contains: stock, Dow, share, Nasdaq, etc.  Business
 If article contains: set, breakpoint, player, Federer, etc.  Tennis
Machine learning algorithms
3
What is Machine Learning?
4
Definition: A computer program is said to learn from
experience E when its performance P at a task T
improves with experience E.
Tom Mitchell, Machine Learning, 1997
Examples:
- Learning to recognize spoken words
- Learning to drive a vehicle
- Learning to play backgammon
Components of a ML System (1)
Experience (a set of examples that combines together
input and output for a task)
 Text categorization: document + category
 Speech recognition: spoken text + written text
Experience is referred to as Training Data. When training
data is available, we talk of Supervised Learning.
Performance metrics
 Error or accuracy in the Test Data
 Test Data are not present in the Training Data
 When there are few training data, methods like ‘leave-one-out’ or
‘ten-fold cross validation’ are used to measure error.
5
Components of a ML System (2)
Type of knowledge to be learned (known as the target
function, that will map between input and output)
Representation of the target function
 Decision trees
 Neural networks
 Linear functions
The learning algorithm
 C4.5 (learns decision trees)
 Gradient descent (learns a neural network)
 Linear programming (learns linear functions)
6
Task
Defining Text Classification
7
XdX∈d
},,,{ 21 Jccc =C
D cd,
C×∈Xcd,
C→X:γ
γ=Γ D)(
the document in the multi-dimensional space
a set of classes (categories, or labels)
the training set of labeled documents
Target function:
Learning algorithm:
=cd, “Beijing joins the World Trade Organization”, China
cd =)(γ =)(dγ China
Naïve Bayes Learning
8
∏≤≤∈∈
==
dnk
k
CcCc
MAP ctPcPdcPc
1
)|(ˆ)(ˆmaxarg)|(ˆmaxarg
cd =)(γ
Learning Algorithm: Naïve Bayes
Target Function:
)|()(maxarg)|(maxarg cdPcPdcPc
CcCc
MAP
∈∈
==
)(cP
)|( cdP
The generative process:
)|( dcP
a priori probability, of choosing a category
the cond. prob. of generating d, given the fixed c
a posteriori probability that c generated d
A Refresher on Probability
9
Visualizing probability
A is a random variable that denotes an uncertain event
 Example: A = “I’ll get an A+ in the final exam”
P(A) is “the fraction of possible worlds where A is true”
10
Worlds in
which A
is true
Slide: Andrew W. Moore
Worlds in which A is false
Event space of all possible
worlds. Its area is 1.
P(A) = Area of the blue
circle.
Axioms and Theorems of Probability
Axioms:
 0 <= P(A) <= 1
 P(True) = 1
 P(False) = 0
 P(A or B) = P(A) + P(B) – P(A and B)
Theorems:
 P(not A) = P(~A) = 1 – P(A)
 P(A) = P(A ^ B) + P(A ^ ~B)
11
Conditional Probability
P(A|B) = the probability of A being true, given that we
know that B is true
12
F
H
H = “I have a headache”
F = “Coming down with flu”
P(H) = 1/10
P(F) = 1/40
P(H/F) = 1/2
Slide: Andrew W. Moore
Headaches are rare and flu
even rarer, but if you got that flu,
there is a 50-50 chance you’ll
have a headache.
Deriving the Bayes Rule
13
)(
)(
)|(
BP
BAP
BAP
∧
=Conditional Probability:
)()|()( BPBAPBAP =∧Chain rule:
)()|()()( APABPABPBAP =∧=∧
Bayes Rule:
)(
)()|(
)|(
AP
BPBAP
ABP =
Back to the Naïve Bayes Classifier
14
Deriving the Naïve Bayes
15
)(
)()|(
)|(
AP
BPBAP
ABP = (Bayes Rule)
21,cc 'dGiven two classes and the document
)'(
)|'()(
)'|( 11
1
dP
cdPcP
dcP =
)'(
)|'()(
)'|( 22
2
dP
cdPcP
dcP =
We are looking for a that maximizes the a-posterioriic )'|( dcP i
)'(dP (the denominator) is the same in both cases
)|()(maxarg cdPcPc
Cc
MAP
∈
=Thus:
Estimating parameters for the
target function
We are looking for the estimates and
16
)(ˆ cP )|(ˆ cdP
P(c) is the fraction of possible worlds where c is true.
N
N
cP c
=)(ˆ N – number of all documents
Nc – number of documents in class c
d is a vector in the space X
)|,,,()|( 2 ctttPcdP dni =
where each dimension is a term:
)()|()( BPBAPBAP =∧By using the chain rule: we have:
(P
),,...,(),,...,|()|,,,( 2212 cttPctttPctttP ddd nnni =
...=
Naïve assumptions of independence
1. All attribute values are independent of each other given
the class. (conditional independence assumption)
2. The conditional probabilities for a term are the same
independent of position in the document.
We assume the document is a “bag-of-words”.
17
∏≤≤
==
d
d
nk
kni ctPctttPcdP
1
2 )|()|,,,()|( 
∏≤≤∈∈
==
dnk
k
CcCc
MAP ctPcPdcPc
1
)|(ˆ)(ˆmaxarg)|(ˆmaxarg
Finally, we get the target function of Slide 8:
Again about estimation
18
For each term, t, we need to estimate P(t|c)
∑ ∈
=
Vt ct
ct
T
T
ctP
' '
)|(ˆ
Because an estimate will be 0 if a term does not appear with a class
in the training data, we need smoothing:
||)(
1
)1(
1
)|(ˆ
' '' ' VT
T
T
T
ctP
Vt ct
ct
Vt ct
ct
∑∑ ∈∈
+
+
=
+
+
=Laplace
Smoothing
|V| is the number of terms in the vocabulary
Tct is the count of term t in all documents of class c
An Example of classification with
Naïve Bayes
19
Example 13.1 (Part 1)
20
Training
set
docID c = China?
1 Chinese Beijing Chinese Yes
2 Chinese Chinese Shangai Yes
3 Chinese Macao Yes
4 Tokyo Japan Chinese No
Test set 5 Chinese Chinese Chinese Tokyo Japan ?
Two classes: “China”, “not China”
N = 4 4/3)(ˆ =cP 4/1)(ˆ =cP
V = {Beijing, Chinese, Japan, Macao, Tokyo}
Example 13.1 (Part 1)
21
Training
set
docID c = China?
1 Chinese Beijing Chinese Yes
2 Chinese Chinese Shangai Yes
3 Chinese Macao Yes
4 Tokyo Japan Chinese No
Test set 5 Chinese Chinese Chinese Tokyo Japan ?
7/3)68/()15()|Chinese(ˆ =++=cP
14/1)68/()10()|Japan(ˆ)|Tokyo(ˆ =++== cPcP
9/2)63/()11()|Chinese(ˆ =++=cP
9/2)63/()11()|Japan(ˆ)|Tokyo(ˆ =++== cPcP
Estimation Classification
∏≤≤
∝
dnk
k ctPcPdcP
1
)|()()|(
0001.09/29/2)9/2(4/1)|(
0003.014/114/1)7/3(4/3)|(
3
5
3
5
≈⋅⋅⋅∝
≈⋅⋅⋅∝
dcP
dcP
Summary: Miscellanious
Naïve Bayes is linear in the time is takes to scan the data
When we have many terms, the product of probabilities
with cause a floating point underflow, therefore:
For a large training set, the vocabulary is large. It is better
to select only a subset of terms. For that is used “feature
selection” (Section 13.5).
22
∑≤≤∈
+=
dnk
k
Cc
MAP ctPcPc
1
)|(log)(ˆ[logmaxarg

More Related Content

What's hot

Association rule mining
Association rule miningAssociation rule mining
Association rule mining
Acad
 
Assosiate rule mining
Assosiate rule miningAssosiate rule mining
Classification in data mining
Classification in data mining Classification in data mining
Classification in data mining
Sulman Ahmed
 
3.2 partitioning methods
3.2 partitioning methods3.2 partitioning methods
3.2 partitioning methods
Krish_ver2
 
3.3 hierarchical methods
3.3 hierarchical methods3.3 hierarchical methods
3.3 hierarchical methods
Krish_ver2
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
Saurabh Kaushik
 
BERT MODULE FOR TEXT CLASSIFICATION.pptx
BERT MODULE FOR TEXT CLASSIFICATION.pptxBERT MODULE FOR TEXT CLASSIFICATION.pptx
BERT MODULE FOR TEXT CLASSIFICATION.pptx
ManvanthBC
 
Decision Tree - C4.5&CART
Decision Tree - C4.5&CARTDecision Tree - C4.5&CART
Decision Tree - C4.5&CART
Xueping Peng
 
Topic Modeling - NLP
Topic Modeling - NLPTopic Modeling - NLP
Topic Modeling - NLP
Rupak Roy
 
Introduction to natural language processing, history and origin
Introduction to natural language processing, history and originIntroduction to natural language processing, history and origin
Introduction to natural language processing, history and origin
Shubhankar Mohan
 
SoDA v2 - Named Entity Recognition from streaming text
SoDA v2 - Named Entity Recognition from streaming textSoDA v2 - Named Entity Recognition from streaming text
SoDA v2 - Named Entity Recognition from streaming text
Sujit Pal
 
NLP_KASHK:Evaluating Language Model
NLP_KASHK:Evaluating Language ModelNLP_KASHK:Evaluating Language Model
NLP_KASHK:Evaluating Language Model
Hemantha Kulathilake
 
2.2 decision tree
2.2 decision tree2.2 decision tree
2.2 decision tree
Krish_ver2
 
Ensemble learning
Ensemble learningEnsemble learning
Ensemble learning
Mustafa Sherazi
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
Bhupender Sharma
 
Naive bayes
Naive bayesNaive bayes
Naive bayes
Ashraf Uddin
 
Tweets Classification using Naive Bayes and SVM
Tweets Classification using Naive Bayes and SVMTweets Classification using Naive Bayes and SVM
Tweets Classification using Naive Bayes and SVM
Trilok Sharma
 
Unsupervised learning (clustering)
Unsupervised learning (clustering)Unsupervised learning (clustering)
Unsupervised learning (clustering)
Pravinkumar Landge
 
Support Vector Machines for Classification
Support Vector Machines for ClassificationSupport Vector Machines for Classification
Support Vector Machines for Classification
Prakash Pimpale
 
Seq2Seq (encoder decoder) model
Seq2Seq (encoder decoder) modelSeq2Seq (encoder decoder) model
Seq2Seq (encoder decoder) model
佳蓉 倪
 

What's hot (20)

Association rule mining
Association rule miningAssociation rule mining
Association rule mining
 
Assosiate rule mining
Assosiate rule miningAssosiate rule mining
Assosiate rule mining
 
Classification in data mining
Classification in data mining Classification in data mining
Classification in data mining
 
3.2 partitioning methods
3.2 partitioning methods3.2 partitioning methods
3.2 partitioning methods
 
3.3 hierarchical methods
3.3 hierarchical methods3.3 hierarchical methods
3.3 hierarchical methods
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
BERT MODULE FOR TEXT CLASSIFICATION.pptx
BERT MODULE FOR TEXT CLASSIFICATION.pptxBERT MODULE FOR TEXT CLASSIFICATION.pptx
BERT MODULE FOR TEXT CLASSIFICATION.pptx
 
Decision Tree - C4.5&CART
Decision Tree - C4.5&CARTDecision Tree - C4.5&CART
Decision Tree - C4.5&CART
 
Topic Modeling - NLP
Topic Modeling - NLPTopic Modeling - NLP
Topic Modeling - NLP
 
Introduction to natural language processing, history and origin
Introduction to natural language processing, history and originIntroduction to natural language processing, history and origin
Introduction to natural language processing, history and origin
 
SoDA v2 - Named Entity Recognition from streaming text
SoDA v2 - Named Entity Recognition from streaming textSoDA v2 - Named Entity Recognition from streaming text
SoDA v2 - Named Entity Recognition from streaming text
 
NLP_KASHK:Evaluating Language Model
NLP_KASHK:Evaluating Language ModelNLP_KASHK:Evaluating Language Model
NLP_KASHK:Evaluating Language Model
 
2.2 decision tree
2.2 decision tree2.2 decision tree
2.2 decision tree
 
Ensemble learning
Ensemble learningEnsemble learning
Ensemble learning
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Naive bayes
Naive bayesNaive bayes
Naive bayes
 
Tweets Classification using Naive Bayes and SVM
Tweets Classification using Naive Bayes and SVMTweets Classification using Naive Bayes and SVM
Tweets Classification using Naive Bayes and SVM
 
Unsupervised learning (clustering)
Unsupervised learning (clustering)Unsupervised learning (clustering)
Unsupervised learning (clustering)
 
Support Vector Machines for Classification
Support Vector Machines for ClassificationSupport Vector Machines for Classification
Support Vector Machines for Classification
 
Seq2Seq (encoder decoder) model
Seq2Seq (encoder decoder) modelSeq2Seq (encoder decoder) model
Seq2Seq (encoder decoder) model
 

Viewers also liked

Dwdm naive bayes_ankit_gadgil_027
Dwdm naive bayes_ankit_gadgil_027Dwdm naive bayes_ankit_gadgil_027
Dwdm naive bayes_ankit_gadgil_027
ankitgadgil
 
Naive Bayesian Text Classifier Event Models
Naive Bayesian Text Classifier Event ModelsNaive Bayesian Text Classifier Event Models
Naive Bayesian Text Classifier Event Models
DKALab
 
Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier
Dev Sahu
 
How Sentiment Analysis works
How Sentiment Analysis worksHow Sentiment Analysis works
How Sentiment Analysis works
CJ Jenkins
 
Modeling Social Data, Lecture 6: Classification with Naive Bayes
Modeling Social Data, Lecture 6: Classification with Naive BayesModeling Social Data, Lecture 6: Classification with Naive Bayes
Modeling Social Data, Lecture 6: Classification with Naive Bayes
jakehofman
 
Sentiment analysis of tweets
Sentiment analysis of tweetsSentiment analysis of tweets
Sentiment analysis of tweets
Vasu Jain
 
Supervised algorithms
Supervised algorithmsSupervised algorithms
Supervised algorithms
Yassine Akhiat
 
Feature specific analysis of reviews
Feature specific analysis of reviewsFeature specific analysis of reviews
Feature specific analysis of reviews
Subhabrata Mukherjee
 
Text Classification/Categorization
Text Classification/CategorizationText Classification/Categorization
Text Classification/Categorization
Oswal Abhishek
 
Text classification
Text classificationText classification
Text classification
James Wong
 
Text classification & sentiment analysis
Text classification & sentiment analysisText classification & sentiment analysis
Text classification & sentiment analysis
M. Atif Qureshi
 
Sentiment Analysis in R
Sentiment Analysis in RSentiment Analysis in R
Sentiment Analysis in R
Edureka!
 
Sentiment Analysis via R Programming
Sentiment Analysis via R ProgrammingSentiment Analysis via R Programming
Sentiment Analysis via R Programming
Skillspeed
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learning
Paras Kohli
 
Supervised Learning
Supervised LearningSupervised Learning
Supervised Learning
butest
 
Presentation on supervised learning
Presentation on supervised learningPresentation on supervised learning
Presentation on supervised learning
Tonmoy Bhagawati
 
Cs221 lecture5-fall11
Cs221 lecture5-fall11Cs221 lecture5-fall11
Cs221 lecture5-fall11
darwinrlo
 
Introduction to Sentiment Analysis
Introduction to Sentiment AnalysisIntroduction to Sentiment Analysis
Introduction to Sentiment Analysis
Jaganadh Gopinadhan
 
Classification with Naive Bayes
Classification with Naive BayesClassification with Naive Bayes
Classification with Naive Bayes
Josh Patterson
 
2013-1 Machine Learning Lecture 03 - Naïve Bayes Classifiers
2013-1 Machine Learning Lecture 03 - Naïve Bayes Classifiers2013-1 Machine Learning Lecture 03 - Naïve Bayes Classifiers
2013-1 Machine Learning Lecture 03 - Naïve Bayes Classifiers
Dongseo University
 

Viewers also liked (20)

Dwdm naive bayes_ankit_gadgil_027
Dwdm naive bayes_ankit_gadgil_027Dwdm naive bayes_ankit_gadgil_027
Dwdm naive bayes_ankit_gadgil_027
 
Naive Bayesian Text Classifier Event Models
Naive Bayesian Text Classifier Event ModelsNaive Bayesian Text Classifier Event Models
Naive Bayesian Text Classifier Event Models
 
Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier
 
How Sentiment Analysis works
How Sentiment Analysis worksHow Sentiment Analysis works
How Sentiment Analysis works
 
Modeling Social Data, Lecture 6: Classification with Naive Bayes
Modeling Social Data, Lecture 6: Classification with Naive BayesModeling Social Data, Lecture 6: Classification with Naive Bayes
Modeling Social Data, Lecture 6: Classification with Naive Bayes
 
Sentiment analysis of tweets
Sentiment analysis of tweetsSentiment analysis of tweets
Sentiment analysis of tweets
 
Supervised algorithms
Supervised algorithmsSupervised algorithms
Supervised algorithms
 
Feature specific analysis of reviews
Feature specific analysis of reviewsFeature specific analysis of reviews
Feature specific analysis of reviews
 
Text Classification/Categorization
Text Classification/CategorizationText Classification/Categorization
Text Classification/Categorization
 
Text classification
Text classificationText classification
Text classification
 
Text classification & sentiment analysis
Text classification & sentiment analysisText classification & sentiment analysis
Text classification & sentiment analysis
 
Sentiment Analysis in R
Sentiment Analysis in RSentiment Analysis in R
Sentiment Analysis in R
 
Sentiment Analysis via R Programming
Sentiment Analysis via R ProgrammingSentiment Analysis via R Programming
Sentiment Analysis via R Programming
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learning
 
Supervised Learning
Supervised LearningSupervised Learning
Supervised Learning
 
Presentation on supervised learning
Presentation on supervised learningPresentation on supervised learning
Presentation on supervised learning
 
Cs221 lecture5-fall11
Cs221 lecture5-fall11Cs221 lecture5-fall11
Cs221 lecture5-fall11
 
Introduction to Sentiment Analysis
Introduction to Sentiment AnalysisIntroduction to Sentiment Analysis
Introduction to Sentiment Analysis
 
Classification with Naive Bayes
Classification with Naive BayesClassification with Naive Bayes
Classification with Naive Bayes
 
2013-1 Machine Learning Lecture 03 - Naïve Bayes Classifiers
2013-1 Machine Learning Lecture 03 - Naïve Bayes Classifiers2013-1 Machine Learning Lecture 03 - Naïve Bayes Classifiers
2013-1 Machine Learning Lecture 03 - Naïve Bayes Classifiers
 

Similar to Text classification

Naive Bayes Presentation
Naive Bayes PresentationNaive Bayes Presentation
Naive Bayes Presentation
Md. Enamul Haque Chowdhury
 
Joint optimization framework for learning with noisy labels
Joint optimization framework for learning with noisy labelsJoint optimization framework for learning with noisy labels
Joint optimization framework for learning with noisy labels
Cheng-You Lu
 
Hands-on Tutorial of Machine Learning in Python
Hands-on Tutorial of Machine Learning in PythonHands-on Tutorial of Machine Learning in Python
Hands-on Tutorial of Machine Learning in Python
Chun-Ming Chang
 
Pattern recognition binoy 05-naive bayes classifier
Pattern recognition binoy 05-naive bayes classifierPattern recognition binoy 05-naive bayes classifier
Pattern recognition binoy 05-naive bayes classifier
108kaushik
 
ML unit-1.pptx
ML unit-1.pptxML unit-1.pptx
ML unit-1.pptx
SwarnaKumariChinni
 
An introduction to Bayesian Statistics using Python
An introduction to Bayesian Statistics using PythonAn introduction to Bayesian Statistics using Python
An introduction to Bayesian Statistics using Python
freshdatabos
 
[系列活動] Machine Learning 機器學習課程
[系列活動] Machine Learning 機器學習課程[系列活動] Machine Learning 機器學習課程
[系列活動] Machine Learning 機器學習課程
台灣資料科學年會
 
Automatic bayesian cubature
Automatic bayesian cubatureAutomatic bayesian cubature
Automatic bayesian cubature
Jagadeeswaran Rathinavel
 
Alpaydin - Chapter 2
Alpaydin - Chapter 2Alpaydin - Chapter 2
Alpaydin - Chapter 2
butest
 
Scala as a Declarative Language
Scala as a Declarative LanguageScala as a Declarative Language
Scala as a Declarative Language
vsssuresh
 
Alpaydin - Chapter 2
Alpaydin - Chapter 2Alpaydin - Chapter 2
Alpaydin - Chapter 2
butest
 
ch8Bayes.pptx
ch8Bayes.pptxch8Bayes.pptx
ch8Bayes.pptx
ssuser4c50a9
 
Python Lab manual program for BE First semester (all department
Python Lab manual program for BE First semester (all departmentPython Lab manual program for BE First semester (all department
Python Lab manual program for BE First semester (all department
Nazeer Wahab
 
The Concurrent Constraint Programming Research Programmes -- Redux (part2)
The Concurrent Constraint Programming Research Programmes -- Redux (part2)The Concurrent Constraint Programming Research Programmes -- Redux (part2)
The Concurrent Constraint Programming Research Programmes -- Redux (part2)
Pierre Schaus
 
Data.Mining.C.6(II).classification and prediction
Data.Mining.C.6(II).classification and predictionData.Mining.C.6(II).classification and prediction
Data.Mining.C.6(II).classification and prediction
Margaret Wang
 
ch8Bayes.ppt
ch8Bayes.pptch8Bayes.ppt
ch8Bayes.ppt
ImXaib
 
ch8Bayes.ppt
ch8Bayes.pptch8Bayes.ppt
ch8Bayes.ppt
INyomanSwitrayana
 
Introduction
IntroductionIntroduction
Introduction
butest
 
Design and Analysis of Algorithm Brute Force 1.ppt
Design and Analysis of Algorithm Brute Force 1.pptDesign and Analysis of Algorithm Brute Force 1.ppt
Design and Analysis of Algorithm Brute Force 1.ppt
moiza354
 
Midterm
MidtermMidterm

Similar to Text classification (20)

Naive Bayes Presentation
Naive Bayes PresentationNaive Bayes Presentation
Naive Bayes Presentation
 
Joint optimization framework for learning with noisy labels
Joint optimization framework for learning with noisy labelsJoint optimization framework for learning with noisy labels
Joint optimization framework for learning with noisy labels
 
Hands-on Tutorial of Machine Learning in Python
Hands-on Tutorial of Machine Learning in PythonHands-on Tutorial of Machine Learning in Python
Hands-on Tutorial of Machine Learning in Python
 
Pattern recognition binoy 05-naive bayes classifier
Pattern recognition binoy 05-naive bayes classifierPattern recognition binoy 05-naive bayes classifier
Pattern recognition binoy 05-naive bayes classifier
 
ML unit-1.pptx
ML unit-1.pptxML unit-1.pptx
ML unit-1.pptx
 
An introduction to Bayesian Statistics using Python
An introduction to Bayesian Statistics using PythonAn introduction to Bayesian Statistics using Python
An introduction to Bayesian Statistics using Python
 
[系列活動] Machine Learning 機器學習課程
[系列活動] Machine Learning 機器學習課程[系列活動] Machine Learning 機器學習課程
[系列活動] Machine Learning 機器學習課程
 
Automatic bayesian cubature
Automatic bayesian cubatureAutomatic bayesian cubature
Automatic bayesian cubature
 
Alpaydin - Chapter 2
Alpaydin - Chapter 2Alpaydin - Chapter 2
Alpaydin - Chapter 2
 
Scala as a Declarative Language
Scala as a Declarative LanguageScala as a Declarative Language
Scala as a Declarative Language
 
Alpaydin - Chapter 2
Alpaydin - Chapter 2Alpaydin - Chapter 2
Alpaydin - Chapter 2
 
ch8Bayes.pptx
ch8Bayes.pptxch8Bayes.pptx
ch8Bayes.pptx
 
Python Lab manual program for BE First semester (all department
Python Lab manual program for BE First semester (all departmentPython Lab manual program for BE First semester (all department
Python Lab manual program for BE First semester (all department
 
The Concurrent Constraint Programming Research Programmes -- Redux (part2)
The Concurrent Constraint Programming Research Programmes -- Redux (part2)The Concurrent Constraint Programming Research Programmes -- Redux (part2)
The Concurrent Constraint Programming Research Programmes -- Redux (part2)
 
Data.Mining.C.6(II).classification and prediction
Data.Mining.C.6(II).classification and predictionData.Mining.C.6(II).classification and prediction
Data.Mining.C.6(II).classification and prediction
 
ch8Bayes.ppt
ch8Bayes.pptch8Bayes.ppt
ch8Bayes.ppt
 
ch8Bayes.ppt
ch8Bayes.pptch8Bayes.ppt
ch8Bayes.ppt
 
Introduction
IntroductionIntroduction
Introduction
 
Design and Analysis of Algorithm Brute Force 1.ppt
Design and Analysis of Algorithm Brute Force 1.pptDesign and Analysis of Algorithm Brute Force 1.ppt
Design and Analysis of Algorithm Brute Force 1.ppt
 
Midterm
MidtermMidterm
Midterm
 

More from Harry Potter

How to build a rest api.pptx
How to build a rest api.pptxHow to build a rest api.pptx
How to build a rest api.pptx
Harry Potter
 
Business analytics and data mining
Business analytics and data miningBusiness analytics and data mining
Business analytics and data mining
Harry Potter
 
Big picture of data mining
Big picture of data miningBig picture of data mining
Big picture of data mining
Harry Potter
 
Data mining and knowledge discovery
Data mining and knowledge discoveryData mining and knowledge discovery
Data mining and knowledge discovery
Harry Potter
 
Cache recap
Cache recapCache recap
Cache recap
Harry Potter
 
Directory based cache coherence
Directory based cache coherenceDirectory based cache coherence
Directory based cache coherence
Harry Potter
 
How analysis services caching works
How analysis services caching worksHow analysis services caching works
How analysis services caching works
Harry Potter
 
Optimizing shared caches in chip multiprocessors
Optimizing shared caches in chip multiprocessorsOptimizing shared caches in chip multiprocessors
Optimizing shared caches in chip multiprocessors
Harry Potter
 
Hardware managed cache
Hardware managed cacheHardware managed cache
Hardware managed cache
Harry Potter
 
Smm & caching
Smm & cachingSmm & caching
Smm & caching
Harry Potter
 
Data structures and algorithms
Data structures and algorithmsData structures and algorithms
Data structures and algorithms
Harry Potter
 
Abstract data types
Abstract data typesAbstract data types
Abstract data types
Harry Potter
 
Abstraction file
Abstraction fileAbstraction file
Abstraction file
Harry Potter
 
Object model
Object modelObject model
Object model
Harry Potter
 
Concurrency with java
Concurrency with javaConcurrency with java
Concurrency with java
Harry Potter
 
Encapsulation anonymous class
Encapsulation anonymous classEncapsulation anonymous class
Encapsulation anonymous class
Harry Potter
 
Abstract class
Abstract classAbstract class
Abstract class
Harry Potter
 
Object oriented analysis
Object oriented analysisObject oriented analysis
Object oriented analysis
Harry Potter
 
Api crash
Api crashApi crash
Api crash
Harry Potter
 
Rest api to integrate with your site
Rest api to integrate with your siteRest api to integrate with your site
Rest api to integrate with your site
Harry Potter
 

More from Harry Potter (20)

How to build a rest api.pptx
How to build a rest api.pptxHow to build a rest api.pptx
How to build a rest api.pptx
 
Business analytics and data mining
Business analytics and data miningBusiness analytics and data mining
Business analytics and data mining
 
Big picture of data mining
Big picture of data miningBig picture of data mining
Big picture of data mining
 
Data mining and knowledge discovery
Data mining and knowledge discoveryData mining and knowledge discovery
Data mining and knowledge discovery
 
Cache recap
Cache recapCache recap
Cache recap
 
Directory based cache coherence
Directory based cache coherenceDirectory based cache coherence
Directory based cache coherence
 
How analysis services caching works
How analysis services caching worksHow analysis services caching works
How analysis services caching works
 
Optimizing shared caches in chip multiprocessors
Optimizing shared caches in chip multiprocessorsOptimizing shared caches in chip multiprocessors
Optimizing shared caches in chip multiprocessors
 
Hardware managed cache
Hardware managed cacheHardware managed cache
Hardware managed cache
 
Smm & caching
Smm & cachingSmm & caching
Smm & caching
 
Data structures and algorithms
Data structures and algorithmsData structures and algorithms
Data structures and algorithms
 
Abstract data types
Abstract data typesAbstract data types
Abstract data types
 
Abstraction file
Abstraction fileAbstraction file
Abstraction file
 
Object model
Object modelObject model
Object model
 
Concurrency with java
Concurrency with javaConcurrency with java
Concurrency with java
 
Encapsulation anonymous class
Encapsulation anonymous classEncapsulation anonymous class
Encapsulation anonymous class
 
Abstract class
Abstract classAbstract class
Abstract class
 
Object oriented analysis
Object oriented analysisObject oriented analysis
Object oriented analysis
 
Api crash
Api crashApi crash
Api crash
 
Rest api to integrate with your site
Rest api to integrate with your siteRest api to integrate with your site
Rest api to integrate with your site
 

Recently uploaded

dbms calicut university B. sc Cs 4th sem.pdf
dbms  calicut university B. sc Cs 4th sem.pdfdbms  calicut university B. sc Cs 4th sem.pdf
dbms calicut university B. sc Cs 4th sem.pdf
Shinana2
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
saastr
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdfNunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
flufftailshop
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
alexjohnson7307
 
Trusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process MiningTrusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process Mining
LucaBarbaro3
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
SitimaJohn
 
A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024
Intelisync
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 

Recently uploaded (20)

dbms calicut university B. sc Cs 4th sem.pdf
dbms  calicut university B. sc Cs 4th sem.pdfdbms  calicut university B. sc Cs 4th sem.pdf
dbms calicut university B. sc Cs 4th sem.pdf
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdfNunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
 
Trusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process MiningTrusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process Mining
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
 
A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 

Text classification

  • 1. Text Classification and Naïve Bayes An example of text classification Definition of a machine learning problem A refresher on probability The Naive Bayes classifier 1
  • 3. Different ways for classification Human labor (people assign categories to every incoming article) Hand-crafted rules for automatic classification  If article contains: stock, Dow, share, Nasdaq, etc.  Business  If article contains: set, breakpoint, player, Federer, etc.  Tennis Machine learning algorithms 3
  • 4. What is Machine Learning? 4 Definition: A computer program is said to learn from experience E when its performance P at a task T improves with experience E. Tom Mitchell, Machine Learning, 1997 Examples: - Learning to recognize spoken words - Learning to drive a vehicle - Learning to play backgammon
  • 5. Components of a ML System (1) Experience (a set of examples that combines together input and output for a task)  Text categorization: document + category  Speech recognition: spoken text + written text Experience is referred to as Training Data. When training data is available, we talk of Supervised Learning. Performance metrics  Error or accuracy in the Test Data  Test Data are not present in the Training Data  When there are few training data, methods like ‘leave-one-out’ or ‘ten-fold cross validation’ are used to measure error. 5
  • 6. Components of a ML System (2) Type of knowledge to be learned (known as the target function, that will map between input and output) Representation of the target function  Decision trees  Neural networks  Linear functions The learning algorithm  C4.5 (learns decision trees)  Gradient descent (learns a neural network)  Linear programming (learns linear functions) 6 Task
  • 7. Defining Text Classification 7 XdX∈d },,,{ 21 Jccc =C D cd, C×∈Xcd, C→X:γ γ=Γ D)( the document in the multi-dimensional space a set of classes (categories, or labels) the training set of labeled documents Target function: Learning algorithm: =cd, “Beijing joins the World Trade Organization”, China cd =)(γ =)(dγ China
  • 8. Naïve Bayes Learning 8 ∏≤≤∈∈ == dnk k CcCc MAP ctPcPdcPc 1 )|(ˆ)(ˆmaxarg)|(ˆmaxarg cd =)(γ Learning Algorithm: Naïve Bayes Target Function: )|()(maxarg)|(maxarg cdPcPdcPc CcCc MAP ∈∈ == )(cP )|( cdP The generative process: )|( dcP a priori probability, of choosing a category the cond. prob. of generating d, given the fixed c a posteriori probability that c generated d
  • 9. A Refresher on Probability 9
  • 10. Visualizing probability A is a random variable that denotes an uncertain event  Example: A = “I’ll get an A+ in the final exam” P(A) is “the fraction of possible worlds where A is true” 10 Worlds in which A is true Slide: Andrew W. Moore Worlds in which A is false Event space of all possible worlds. Its area is 1. P(A) = Area of the blue circle.
  • 11. Axioms and Theorems of Probability Axioms:  0 <= P(A) <= 1  P(True) = 1  P(False) = 0  P(A or B) = P(A) + P(B) – P(A and B) Theorems:  P(not A) = P(~A) = 1 – P(A)  P(A) = P(A ^ B) + P(A ^ ~B) 11
  • 12. Conditional Probability P(A|B) = the probability of A being true, given that we know that B is true 12 F H H = “I have a headache” F = “Coming down with flu” P(H) = 1/10 P(F) = 1/40 P(H/F) = 1/2 Slide: Andrew W. Moore Headaches are rare and flu even rarer, but if you got that flu, there is a 50-50 chance you’ll have a headache.
  • 13. Deriving the Bayes Rule 13 )( )( )|( BP BAP BAP ∧ =Conditional Probability: )()|()( BPBAPBAP =∧Chain rule: )()|()()( APABPABPBAP =∧=∧ Bayes Rule: )( )()|( )|( AP BPBAP ABP =
  • 14. Back to the Naïve Bayes Classifier 14
  • 15. Deriving the Naïve Bayes 15 )( )()|( )|( AP BPBAP ABP = (Bayes Rule) 21,cc 'dGiven two classes and the document )'( )|'()( )'|( 11 1 dP cdPcP dcP = )'( )|'()( )'|( 22 2 dP cdPcP dcP = We are looking for a that maximizes the a-posterioriic )'|( dcP i )'(dP (the denominator) is the same in both cases )|()(maxarg cdPcPc Cc MAP ∈ =Thus:
  • 16. Estimating parameters for the target function We are looking for the estimates and 16 )(ˆ cP )|(ˆ cdP P(c) is the fraction of possible worlds where c is true. N N cP c =)(ˆ N – number of all documents Nc – number of documents in class c d is a vector in the space X )|,,,()|( 2 ctttPcdP dni = where each dimension is a term: )()|()( BPBAPBAP =∧By using the chain rule: we have: (P ),,...,(),,...,|()|,,,( 2212 cttPctttPctttP ddd nnni = ...=
  • 17. Naïve assumptions of independence 1. All attribute values are independent of each other given the class. (conditional independence assumption) 2. The conditional probabilities for a term are the same independent of position in the document. We assume the document is a “bag-of-words”. 17 ∏≤≤ == d d nk kni ctPctttPcdP 1 2 )|()|,,,()|(  ∏≤≤∈∈ == dnk k CcCc MAP ctPcPdcPc 1 )|(ˆ)(ˆmaxarg)|(ˆmaxarg Finally, we get the target function of Slide 8:
  • 18. Again about estimation 18 For each term, t, we need to estimate P(t|c) ∑ ∈ = Vt ct ct T T ctP ' ' )|(ˆ Because an estimate will be 0 if a term does not appear with a class in the training data, we need smoothing: ||)( 1 )1( 1 )|(ˆ ' '' ' VT T T T ctP Vt ct ct Vt ct ct ∑∑ ∈∈ + + = + + =Laplace Smoothing |V| is the number of terms in the vocabulary Tct is the count of term t in all documents of class c
  • 19. An Example of classification with Naïve Bayes 19
  • 20. Example 13.1 (Part 1) 20 Training set docID c = China? 1 Chinese Beijing Chinese Yes 2 Chinese Chinese Shangai Yes 3 Chinese Macao Yes 4 Tokyo Japan Chinese No Test set 5 Chinese Chinese Chinese Tokyo Japan ? Two classes: “China”, “not China” N = 4 4/3)(ˆ =cP 4/1)(ˆ =cP V = {Beijing, Chinese, Japan, Macao, Tokyo}
  • 21. Example 13.1 (Part 1) 21 Training set docID c = China? 1 Chinese Beijing Chinese Yes 2 Chinese Chinese Shangai Yes 3 Chinese Macao Yes 4 Tokyo Japan Chinese No Test set 5 Chinese Chinese Chinese Tokyo Japan ? 7/3)68/()15()|Chinese(ˆ =++=cP 14/1)68/()10()|Japan(ˆ)|Tokyo(ˆ =++== cPcP 9/2)63/()11()|Chinese(ˆ =++=cP 9/2)63/()11()|Japan(ˆ)|Tokyo(ˆ =++== cPcP Estimation Classification ∏≤≤ ∝ dnk k ctPcPdcP 1 )|()()|( 0001.09/29/2)9/2(4/1)|( 0003.014/114/1)7/3(4/3)|( 3 5 3 5 ≈⋅⋅⋅∝ ≈⋅⋅⋅∝ dcP dcP
  • 22. Summary: Miscellanious Naïve Bayes is linear in the time is takes to scan the data When we have many terms, the product of probabilities with cause a floating point underflow, therefore: For a large training set, the vocabulary is large. It is better to select only a subset of terms. For that is used “feature selection” (Section 13.5). 22 ∑≤≤∈ += dnk k Cc MAP ctPcPc 1 )|(log)(ˆ[logmaxarg

Editor's Notes

  1. Q: What is different in this definition from other types of computer programs? A: We do not speak about experience in other occasions, just about the task and performance criteria. Q: If the task T is speech recognition, could you imagine what would be E and P? A: E would be examples of spoken text, i.e., the computer has the written text and while someone speaks the computer matches the written words to the spoken words. P (performance) will be the number of words that the computer recognizes correctly.
  2. We give the target function at the beginning, but we say that we are going to explain later on how this formula is derived (after the refresher in probability). Give the example of selecting topics for the class project, that means, selecting c. Then, given c, the choice of d, is conditional, P(d|c).
  3. It is clear that calculating all the parameters that derive from the application of the chain rule is infeasible. Therefore, we need the naïve assumptions of independence in next page.