SlideShare a Scribd company logo
Naïve Bayes
Chapter 4, DDS
Introduction
• We discussed the Bayes Rule last class: Here is a
its derivation from first principles of probabilities:
– P(A|B) = P(A&B)/P(B)
P(B|A) = P(A&B)/P(A)P(B|A) P(A) =P(A&B)
P(A|B) =
P(B|A)P(A)
P(B)
• Now lets look a very common application of
Bayes, for supervised learning in classification,
spam filtering
Classification
• Training set  design a model
• Test set  validate the model
• Classify data set using the model
• Goal of classification: to label the items in the
set to one of the given/known classes
• For spam filtering it is binary class: spam or nit
spam(ham)
Why not use methods in ch.3?
• Linear regression is about continuous
variables, not binary class
• K-nn can accommodate multi-features: curse
of dimensionality: 1 distinct word 1
feature 10000 words 10000 features!
• What are we going to use? Naïve Bayes
Lets Review
• A rare disease where 1%
• We have highly sensitive and specific test that is
– 99% positive for sick patients
– 99% negative for non-sick
• If a patients test positive, what is probability that
he/she is sick?
• Approach: patient is sick : sick, tests positive +
• P(sick/+) = P(+/sick) P(sick)/P(+)=
0.99*0.01/(0.99*0.01+0.99*0.01) =
0.099/2*(0.099) = ½ = 0.5
Spam Filter for individual words
Classifying mail into spam and not spam: binary
classification
Lets say if we get a mail with --- you have won a
“lottery” right away you know it is a spam.
We will assume that is if a word qualifies to be a
spam then the email is a spam…
P(spam|word) =
P(word|spam)P(spam)
P(word)
Further discussion
• Lets call good emails “ham”
• P(ham) = 1- P(spam)
• P(word) = P(word|spam)P(spam) + P(word|ham)P(ham)
Sample data
• Enron data: https://www.cs.cmu.edu/~enron
• Enron employee emails
• A small subset chosen for EDA
• 1500 spam, 3672 ham
• Test word is “meeting”…that is, your goal is label a
email with word “meeting” as spam or ham (not spam)
• Run an simple shell script and find out that 16
“meeting”s in spam, 153 “meetings” in ham
• Right away what is your intuition? Now prove it using
Bayes
Calculations
• P(spam) = 1500/(1500+3672) = 0.29
• P(ham) = 0.71
• P(meeting|spam) = 16/1500= 0.0106
• P(meeting|ham) = 15/3672 = 0.0416
• P(meeting) = P(meeting|spam)P(spam) +
P(meeting|ham)P(ham) = 0.0106 *0.29 + 0.0416+0.71= 0.03261
• P(spam|meeting) = P(meeting|spam)*P(spam)/P(meeting)
= 0.0106*0.29/0.03261 = 0.094  9.4%
Simulation using bash shell script
• On to demo
• This code is available in pages 105-106 … good
luck with the typos… figure it out
A spam that combines words: Naïve
Bayes
• Lets transform one word algorithm to a model
that considers all words…
• Form an bit vector for words with each email: X
with xj is 1 if the word is present, 0 if the word is
absent in the email
• Let c denote it is spam
• Then 𝑃 𝑥 𝑐 = 𝑗(∅ 𝑗𝑐)xj (1 - ∅ 𝑗𝑐) (1-xj)
• Lets understand this with an example..and also
turn product into summation..by using log..
Multi-word (contd.)
• …
• log(p(x|c)) = 𝑗 𝑋𝑗 𝑊𝑗 + 𝑤0
• The x weights vary with email… can we
compute using MR?
• Once you know P(x|c), we can estimate P(c|x)
using Bayes Rule (P(c), and P(x) can be
computed as before); we can also use MR for
P(x) computation for various words (KEY)
Wrangling
• Rest of the chapter deals with wrangling of
data
• Very important… what we are doing now with
project 1 and project 2
• Connect to an API and extract data
• The DDS chapter 4 shows an example with
NYT data and classifies the articles.
Summary
• Learn Naïve Bayes Rule
• Application to spam filtering in emails
• Work the example/understand the example
discussed in class: disease one, a spam filter..
• Possible question problem statement 
classification model using Naïve Bayes

More Related Content

What's hot

Ch03 Mining Massive Data Sets stanford
Ch03 Mining Massive Data Sets  stanfordCh03 Mining Massive Data Sets  stanford
Ch03 Mining Massive Data Sets stanford
Sakthivel C R
 
haskell_fp1
haskell_fp1haskell_fp1
haskell_fp1
Sambaiah Kilaru
 
PROLOG: Recursion And Lists In Prolog
PROLOG: Recursion And Lists In PrologPROLOG: Recursion And Lists In Prolog
PROLOG: Recursion And Lists In Prolog
DataminingTools Inc
 
String Handling
String HandlingString Handling
String Handling
Bharat17485
 
BDACA - Lecture3
BDACA - Lecture3BDACA - Lecture3
The Ring programming language version 1.6 book - Part 182 of 189
The Ring programming language version 1.6 book - Part 182 of 189The Ring programming language version 1.6 book - Part 182 of 189
The Ring programming language version 1.6 book - Part 182 of 189
Mahmoud Samir Fayed
 
Introduction to prolog
Introduction to prologIntroduction to prolog
Introduction to prolog
Harry Potter
 
10 logic+programming+with+prolog
10 logic+programming+with+prolog10 logic+programming+with+prolog
10 logic+programming+with+prolog
baran19901990
 
Python strings presentation
Python strings presentationPython strings presentation
Python strings presentation
VedaGayathri1
 
Strings In OOP(Object oriented programming)
Strings In OOP(Object oriented programming)Strings In OOP(Object oriented programming)
Strings In OOP(Object oriented programming)
Danial Virk
 
Prolog
PrologProlog
Python001 training course_mumbai
Python001 training course_mumbaiPython001 training course_mumbai
Python001 training course_mumbai
vibrantuser
 
Babar: Knowledge Recognition, Extraction and Representation
Babar: Knowledge Recognition, Extraction and RepresentationBabar: Knowledge Recognition, Extraction and Representation
Babar: Knowledge Recognition, Extraction and Representation
Pierre de Lacaze
 
Introduction to Prolog
Introduction to PrologIntroduction to Prolog
Introduction to Prolog
Chamath Sajeewa
 
Some tips for taking the High School AP Java college board exam
Some tips for taking the High School  AP Java college board examSome tips for taking the High School  AP Java college board exam
Some tips for taking the High School AP Java college board exam
Michael Scaman
 

What's hot (15)

Ch03 Mining Massive Data Sets stanford
Ch03 Mining Massive Data Sets  stanfordCh03 Mining Massive Data Sets  stanford
Ch03 Mining Massive Data Sets stanford
 
haskell_fp1
haskell_fp1haskell_fp1
haskell_fp1
 
PROLOG: Recursion And Lists In Prolog
PROLOG: Recursion And Lists In PrologPROLOG: Recursion And Lists In Prolog
PROLOG: Recursion And Lists In Prolog
 
String Handling
String HandlingString Handling
String Handling
 
BDACA - Lecture3
BDACA - Lecture3BDACA - Lecture3
BDACA - Lecture3
 
The Ring programming language version 1.6 book - Part 182 of 189
The Ring programming language version 1.6 book - Part 182 of 189The Ring programming language version 1.6 book - Part 182 of 189
The Ring programming language version 1.6 book - Part 182 of 189
 
Introduction to prolog
Introduction to prologIntroduction to prolog
Introduction to prolog
 
10 logic+programming+with+prolog
10 logic+programming+with+prolog10 logic+programming+with+prolog
10 logic+programming+with+prolog
 
Python strings presentation
Python strings presentationPython strings presentation
Python strings presentation
 
Strings In OOP(Object oriented programming)
Strings In OOP(Object oriented programming)Strings In OOP(Object oriented programming)
Strings In OOP(Object oriented programming)
 
Prolog
PrologProlog
Prolog
 
Python001 training course_mumbai
Python001 training course_mumbaiPython001 training course_mumbai
Python001 training course_mumbai
 
Babar: Knowledge Recognition, Extraction and Representation
Babar: Knowledge Recognition, Extraction and RepresentationBabar: Knowledge Recognition, Extraction and Representation
Babar: Knowledge Recognition, Extraction and Representation
 
Introduction to Prolog
Introduction to PrologIntroduction to Prolog
Introduction to Prolog
 
Some tips for taking the High School AP Java college board exam
Some tips for taking the High School  AP Java college board examSome tips for taking the High School  AP Java college board exam
Some tips for taking the High School AP Java college board exam
 

Viewers also liked

Concurrency with java
Concurrency with javaConcurrency with java
Concurrency with java
Young Alista
 
Oberoi Priviera Brochure - Zricks.com
Oberoi Priviera Brochure - Zricks.comOberoi Priviera Brochure - Zricks.com
Oberoi Priviera Brochure - Zricks.com
Zricks.com
 
Rustomjee Elements Brochure - Zricks.com
Rustomjee Elements Brochure - Zricks.comRustomjee Elements Brochure - Zricks.com
Rustomjee Elements Brochure - Zricks.com
Zricks.com
 
Tvs Emerald Green Hills Brochure - Zricks.com
Tvs Emerald Green Hills Brochure - Zricks.comTvs Emerald Green Hills Brochure - Zricks.com
Tvs Emerald Green Hills Brochure - Zricks.com
Zricks.com
 
Adroit Imperia Brochure - Zricks.com
Adroit Imperia Brochure - Zricks.comAdroit Imperia Brochure - Zricks.com
Adroit Imperia Brochure - Zricks.com
Zricks.com
 
Goel Ganga Prive Brochure - Zricks.com
Goel Ganga Prive Brochure - Zricks.comGoel Ganga Prive Brochure - Zricks.com
Goel Ganga Prive Brochure - Zricks.com
Zricks.com
 
Legacy Tierra Brochure - Zricks.com
Legacy Tierra Brochure - Zricks.comLegacy Tierra Brochure - Zricks.com
Legacy Tierra Brochure - Zricks.com
Zricks.com
 
Prestige Sunrise Park Brochure - Zricks.com
Prestige Sunrise Park Brochure - Zricks.comPrestige Sunrise Park Brochure - Zricks.com
Prestige Sunrise Park Brochure - Zricks.com
Zricks.com
 
Romell Aether Brochure - Zricks.com
Romell Aether Brochure - Zricks.comRomell Aether Brochure - Zricks.com
Romell Aether Brochure - Zricks.com
Zricks.com
 
Sobha Avenue Brochure - Zricks.com
Sobha Avenue Brochure - Zricks.comSobha Avenue Brochure - Zricks.com
Sobha Avenue Brochure - Zricks.com
Zricks.com
 
Paranjape Athashri Valley Brochure - Zricks.com
Paranjape Athashri Valley Brochure - Zricks.comParanjape Athashri Valley Brochure - Zricks.com
Paranjape Athashri Valley Brochure - Zricks.com
Zricks.com
 
Spenta Towers Brochure - Zricks.com
Spenta Towers Brochure - Zricks.comSpenta Towers Brochure - Zricks.com
Spenta Towers Brochure - Zricks.com
Zricks.com
 
DS MAX Suncrest Brochure - Zricks.com
DS MAX Suncrest Brochure - Zricks.comDS MAX Suncrest Brochure - Zricks.com
DS MAX Suncrest Brochure - Zricks.com
Zricks.com
 
DS MAX Silver Bell Brochure - Zricks.com
DS MAX Silver Bell Brochure - Zricks.comDS MAX Silver Bell Brochure - Zricks.com
DS MAX Silver Bell Brochure - Zricks.com
Zricks.com
 
Aparna Westside Brochure - Zricks.com
Aparna Westside Brochure - Zricks.comAparna Westside Brochure - Zricks.com
Aparna Westside Brochure - Zricks.com
Zricks.com
 
Paranjape Xion Brochure - Zricks.com
Paranjape Xion Brochure - Zricks.comParanjape Xion Brochure - Zricks.com
Paranjape Xion Brochure - Zricks.com
Zricks.com
 
Arge Urban Bloom Brochure - Zricks.com
Arge Urban Bloom Brochure - Zricks.comArge Urban Bloom Brochure - Zricks.com
Arge Urban Bloom Brochure - Zricks.com
Zricks.com
 
Oberoi Prisma Brochure - Zricks.com
Oberoi Prisma Brochure - Zricks.comOberoi Prisma Brochure - Zricks.com
Oberoi Prisma Brochure - Zricks.com
Zricks.com
 
Aparna Sarovar Grande Brochure - Zricks.com
Aparna Sarovar Grande Brochure - Zricks.comAparna Sarovar Grande Brochure - Zricks.com
Aparna Sarovar Grande Brochure - Zricks.com
Zricks.com
 
Manar Sirri Brochure - Zricks.com
Manar Sirri Brochure - Zricks.comManar Sirri Brochure - Zricks.com
Manar Sirri Brochure - Zricks.com
Zricks.com
 

Viewers also liked (20)

Concurrency with java
Concurrency with javaConcurrency with java
Concurrency with java
 
Oberoi Priviera Brochure - Zricks.com
Oberoi Priviera Brochure - Zricks.comOberoi Priviera Brochure - Zricks.com
Oberoi Priviera Brochure - Zricks.com
 
Rustomjee Elements Brochure - Zricks.com
Rustomjee Elements Brochure - Zricks.comRustomjee Elements Brochure - Zricks.com
Rustomjee Elements Brochure - Zricks.com
 
Tvs Emerald Green Hills Brochure - Zricks.com
Tvs Emerald Green Hills Brochure - Zricks.comTvs Emerald Green Hills Brochure - Zricks.com
Tvs Emerald Green Hills Brochure - Zricks.com
 
Adroit Imperia Brochure - Zricks.com
Adroit Imperia Brochure - Zricks.comAdroit Imperia Brochure - Zricks.com
Adroit Imperia Brochure - Zricks.com
 
Goel Ganga Prive Brochure - Zricks.com
Goel Ganga Prive Brochure - Zricks.comGoel Ganga Prive Brochure - Zricks.com
Goel Ganga Prive Brochure - Zricks.com
 
Legacy Tierra Brochure - Zricks.com
Legacy Tierra Brochure - Zricks.comLegacy Tierra Brochure - Zricks.com
Legacy Tierra Brochure - Zricks.com
 
Prestige Sunrise Park Brochure - Zricks.com
Prestige Sunrise Park Brochure - Zricks.comPrestige Sunrise Park Brochure - Zricks.com
Prestige Sunrise Park Brochure - Zricks.com
 
Romell Aether Brochure - Zricks.com
Romell Aether Brochure - Zricks.comRomell Aether Brochure - Zricks.com
Romell Aether Brochure - Zricks.com
 
Sobha Avenue Brochure - Zricks.com
Sobha Avenue Brochure - Zricks.comSobha Avenue Brochure - Zricks.com
Sobha Avenue Brochure - Zricks.com
 
Paranjape Athashri Valley Brochure - Zricks.com
Paranjape Athashri Valley Brochure - Zricks.comParanjape Athashri Valley Brochure - Zricks.com
Paranjape Athashri Valley Brochure - Zricks.com
 
Spenta Towers Brochure - Zricks.com
Spenta Towers Brochure - Zricks.comSpenta Towers Brochure - Zricks.com
Spenta Towers Brochure - Zricks.com
 
DS MAX Suncrest Brochure - Zricks.com
DS MAX Suncrest Brochure - Zricks.comDS MAX Suncrest Brochure - Zricks.com
DS MAX Suncrest Brochure - Zricks.com
 
DS MAX Silver Bell Brochure - Zricks.com
DS MAX Silver Bell Brochure - Zricks.comDS MAX Silver Bell Brochure - Zricks.com
DS MAX Silver Bell Brochure - Zricks.com
 
Aparna Westside Brochure - Zricks.com
Aparna Westside Brochure - Zricks.comAparna Westside Brochure - Zricks.com
Aparna Westside Brochure - Zricks.com
 
Paranjape Xion Brochure - Zricks.com
Paranjape Xion Brochure - Zricks.comParanjape Xion Brochure - Zricks.com
Paranjape Xion Brochure - Zricks.com
 
Arge Urban Bloom Brochure - Zricks.com
Arge Urban Bloom Brochure - Zricks.comArge Urban Bloom Brochure - Zricks.com
Arge Urban Bloom Brochure - Zricks.com
 
Oberoi Prisma Brochure - Zricks.com
Oberoi Prisma Brochure - Zricks.comOberoi Prisma Brochure - Zricks.com
Oberoi Prisma Brochure - Zricks.com
 
Aparna Sarovar Grande Brochure - Zricks.com
Aparna Sarovar Grande Brochure - Zricks.comAparna Sarovar Grande Brochure - Zricks.com
Aparna Sarovar Grande Brochure - Zricks.com
 
Manar Sirri Brochure - Zricks.com
Manar Sirri Brochure - Zricks.comManar Sirri Brochure - Zricks.com
Manar Sirri Brochure - Zricks.com
 

Similar to Naïve bayes

tutorial.ppt
tutorial.ppttutorial.ppt
tutorial.ppt
GuioGonza2
 
NLP - Sentiment Analysis
NLP - Sentiment AnalysisNLP - Sentiment Analysis
NLP - Sentiment Analysis
Rupak Roy
 
Naive.pdf
Naive.pdfNaive.pdf
Naive.pdf
MahimMajee
 
Naive Bayes
Naive Bayes Naive Bayes
Naive Bayes
Eric Wilson
 
Data simulation basics
Data simulation basicsData simulation basics
Data simulation basics
Dorothy Bishop
 
Supervised learning: Types of Machine Learning
Supervised learning: Types of Machine LearningSupervised learning: Types of Machine Learning
Supervised learning: Types of Machine Learning
Libya Thomas
 
Naive bayes
Naive bayesNaive bayes
Cs221 lecture5-fall11
Cs221 lecture5-fall11Cs221 lecture5-fall11
Cs221 lecture5-fall11
darwinrlo
 
Word2vec slide(lab seminar)
Word2vec slide(lab seminar)Word2vec slide(lab seminar)
Word2vec slide(lab seminar)
Jinpyo Lee
 
Learn from Example and Learn Probabilistic Model
Learn from Example and Learn Probabilistic ModelLearn from Example and Learn Probabilistic Model
Learn from Example and Learn Probabilistic Model
Junya Tanaka
 
Understanding the Machine Learning Algorithms
Understanding the Machine Learning AlgorithmsUnderstanding the Machine Learning Algorithms
Understanding the Machine Learning Algorithms
Rupak Roy
 
IR-lec17-probabilistic-ir.pdf
IR-lec17-probabilistic-ir.pdfIR-lec17-probabilistic-ir.pdf
IR-lec17-probabilistic-ir.pdf
himarusti
 
Naive_hehe.pptx
Naive_hehe.pptxNaive_hehe.pptx
Naive_hehe.pptx
MahimMajee
 
Model Selection and Validation
Model Selection and ValidationModel Selection and Validation
Model Selection and Validation
gmorishita
 
Functions, List and String methods
Functions, List and String methodsFunctions, List and String methods
Functions, List and String methods
PranavSB
 
NLP Project Full Cycle
NLP Project Full CycleNLP Project Full Cycle
NLP Project Full Cycle
Vsevolod Dyomkin
 
An Overview of Naïve Bayes Classifier
An Overview of Naïve Bayes Classifier An Overview of Naïve Bayes Classifier
An Overview of Naïve Bayes Classifier
ananth
 
An introduction to Bayesian Statistics using Python
An introduction to Bayesian Statistics using PythonAn introduction to Bayesian Statistics using Python
An introduction to Bayesian Statistics using Python
freshdatabos
 
Tutorial 2 (mle + language models)
Tutorial 2 (mle + language models)Tutorial 2 (mle + language models)
Tutorial 2 (mle + language models)
Kira
 
Classifying text with Bayes Models
Classifying text with Bayes ModelsClassifying text with Bayes Models
Classifying text with Bayes Models
Valentin Mihov
 

Similar to Naïve bayes (20)

tutorial.ppt
tutorial.ppttutorial.ppt
tutorial.ppt
 
NLP - Sentiment Analysis
NLP - Sentiment AnalysisNLP - Sentiment Analysis
NLP - Sentiment Analysis
 
Naive.pdf
Naive.pdfNaive.pdf
Naive.pdf
 
Naive Bayes
Naive Bayes Naive Bayes
Naive Bayes
 
Data simulation basics
Data simulation basicsData simulation basics
Data simulation basics
 
Supervised learning: Types of Machine Learning
Supervised learning: Types of Machine LearningSupervised learning: Types of Machine Learning
Supervised learning: Types of Machine Learning
 
Naive bayes
Naive bayesNaive bayes
Naive bayes
 
Cs221 lecture5-fall11
Cs221 lecture5-fall11Cs221 lecture5-fall11
Cs221 lecture5-fall11
 
Word2vec slide(lab seminar)
Word2vec slide(lab seminar)Word2vec slide(lab seminar)
Word2vec slide(lab seminar)
 
Learn from Example and Learn Probabilistic Model
Learn from Example and Learn Probabilistic ModelLearn from Example and Learn Probabilistic Model
Learn from Example and Learn Probabilistic Model
 
Understanding the Machine Learning Algorithms
Understanding the Machine Learning AlgorithmsUnderstanding the Machine Learning Algorithms
Understanding the Machine Learning Algorithms
 
IR-lec17-probabilistic-ir.pdf
IR-lec17-probabilistic-ir.pdfIR-lec17-probabilistic-ir.pdf
IR-lec17-probabilistic-ir.pdf
 
Naive_hehe.pptx
Naive_hehe.pptxNaive_hehe.pptx
Naive_hehe.pptx
 
Model Selection and Validation
Model Selection and ValidationModel Selection and Validation
Model Selection and Validation
 
Functions, List and String methods
Functions, List and String methodsFunctions, List and String methods
Functions, List and String methods
 
NLP Project Full Cycle
NLP Project Full CycleNLP Project Full Cycle
NLP Project Full Cycle
 
An Overview of Naïve Bayes Classifier
An Overview of Naïve Bayes Classifier An Overview of Naïve Bayes Classifier
An Overview of Naïve Bayes Classifier
 
An introduction to Bayesian Statistics using Python
An introduction to Bayesian Statistics using PythonAn introduction to Bayesian Statistics using Python
An introduction to Bayesian Statistics using Python
 
Tutorial 2 (mle + language models)
Tutorial 2 (mle + language models)Tutorial 2 (mle + language models)
Tutorial 2 (mle + language models)
 
Classifying text with Bayes Models
Classifying text with Bayes ModelsClassifying text with Bayes Models
Classifying text with Bayes Models
 

More from Young Alista

Google appenginejava.ppt
Google appenginejava.pptGoogle appenginejava.ppt
Google appenginejava.ppt
Young Alista
 
Motivation for multithreaded architectures
Motivation for multithreaded architecturesMotivation for multithreaded architectures
Motivation for multithreaded architectures
Young Alista
 
Serialization/deserialization
Serialization/deserializationSerialization/deserialization
Serialization/deserialization
Young Alista
 
Big picture of data mining
Big picture of data miningBig picture of data mining
Big picture of data mining
Young Alista
 
Business analytics and data mining
Business analytics and data miningBusiness analytics and data mining
Business analytics and data mining
Young Alista
 
Data mining and knowledge discovery
Data mining and knowledge discoveryData mining and knowledge discovery
Data mining and knowledge discovery
Young Alista
 
Directory based cache coherence
Directory based cache coherenceDirectory based cache coherence
Directory based cache coherence
Young Alista
 
Cache recap
Cache recapCache recap
Cache recap
Young Alista
 
Hardware managed cache
Hardware managed cacheHardware managed cache
Hardware managed cache
Young Alista
 
How analysis services caching works
How analysis services caching worksHow analysis services caching works
How analysis services caching works
Young Alista
 
Object model
Object modelObject model
Object model
Young Alista
 
Optimizing shared caches in chip multiprocessors
Optimizing shared caches in chip multiprocessorsOptimizing shared caches in chip multiprocessors
Optimizing shared caches in chip multiprocessors
Young Alista
 
Abstract data types
Abstract data typesAbstract data types
Abstract data types
Young Alista
 
Abstraction file
Abstraction fileAbstraction file
Abstraction file
Young Alista
 
Data structures and algorithms
Data structures and algorithmsData structures and algorithms
Data structures and algorithms
Young Alista
 
Abstract class
Abstract classAbstract class
Abstract class
Young Alista
 
Inheritance
InheritanceInheritance
Inheritance
Young Alista
 
Cobol, lisp, and python
Cobol, lisp, and pythonCobol, lisp, and python
Cobol, lisp, and python
Young Alista
 
Object oriented analysis
Object oriented analysisObject oriented analysis
Object oriented analysis
Young Alista
 
Programming for engineers in python
Programming for engineers in pythonProgramming for engineers in python
Programming for engineers in python
Young Alista
 

More from Young Alista (20)

Google appenginejava.ppt
Google appenginejava.pptGoogle appenginejava.ppt
Google appenginejava.ppt
 
Motivation for multithreaded architectures
Motivation for multithreaded architecturesMotivation for multithreaded architectures
Motivation for multithreaded architectures
 
Serialization/deserialization
Serialization/deserializationSerialization/deserialization
Serialization/deserialization
 
Big picture of data mining
Big picture of data miningBig picture of data mining
Big picture of data mining
 
Business analytics and data mining
Business analytics and data miningBusiness analytics and data mining
Business analytics and data mining
 
Data mining and knowledge discovery
Data mining and knowledge discoveryData mining and knowledge discovery
Data mining and knowledge discovery
 
Directory based cache coherence
Directory based cache coherenceDirectory based cache coherence
Directory based cache coherence
 
Cache recap
Cache recapCache recap
Cache recap
 
Hardware managed cache
Hardware managed cacheHardware managed cache
Hardware managed cache
 
How analysis services caching works
How analysis services caching worksHow analysis services caching works
How analysis services caching works
 
Object model
Object modelObject model
Object model
 
Optimizing shared caches in chip multiprocessors
Optimizing shared caches in chip multiprocessorsOptimizing shared caches in chip multiprocessors
Optimizing shared caches in chip multiprocessors
 
Abstract data types
Abstract data typesAbstract data types
Abstract data types
 
Abstraction file
Abstraction fileAbstraction file
Abstraction file
 
Data structures and algorithms
Data structures and algorithmsData structures and algorithms
Data structures and algorithms
 
Abstract class
Abstract classAbstract class
Abstract class
 
Inheritance
InheritanceInheritance
Inheritance
 
Cobol, lisp, and python
Cobol, lisp, and pythonCobol, lisp, and python
Cobol, lisp, and python
 
Object oriented analysis
Object oriented analysisObject oriented analysis
Object oriented analysis
 
Programming for engineers in python
Programming for engineers in pythonProgramming for engineers in python
Programming for engineers in python
 

Recently uploaded

Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
IndexBug
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 

Recently uploaded (20)

Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 

Naïve bayes

  • 2. Introduction • We discussed the Bayes Rule last class: Here is a its derivation from first principles of probabilities: – P(A|B) = P(A&B)/P(B) P(B|A) = P(A&B)/P(A)P(B|A) P(A) =P(A&B) P(A|B) = P(B|A)P(A) P(B) • Now lets look a very common application of Bayes, for supervised learning in classification, spam filtering
  • 3. Classification • Training set  design a model • Test set  validate the model • Classify data set using the model • Goal of classification: to label the items in the set to one of the given/known classes • For spam filtering it is binary class: spam or nit spam(ham)
  • 4. Why not use methods in ch.3? • Linear regression is about continuous variables, not binary class • K-nn can accommodate multi-features: curse of dimensionality: 1 distinct word 1 feature 10000 words 10000 features! • What are we going to use? Naïve Bayes
  • 5. Lets Review • A rare disease where 1% • We have highly sensitive and specific test that is – 99% positive for sick patients – 99% negative for non-sick • If a patients test positive, what is probability that he/she is sick? • Approach: patient is sick : sick, tests positive + • P(sick/+) = P(+/sick) P(sick)/P(+)= 0.99*0.01/(0.99*0.01+0.99*0.01) = 0.099/2*(0.099) = ½ = 0.5
  • 6. Spam Filter for individual words Classifying mail into spam and not spam: binary classification Lets say if we get a mail with --- you have won a “lottery” right away you know it is a spam. We will assume that is if a word qualifies to be a spam then the email is a spam… P(spam|word) = P(word|spam)P(spam) P(word)
  • 7. Further discussion • Lets call good emails “ham” • P(ham) = 1- P(spam) • P(word) = P(word|spam)P(spam) + P(word|ham)P(ham)
  • 8. Sample data • Enron data: https://www.cs.cmu.edu/~enron • Enron employee emails • A small subset chosen for EDA • 1500 spam, 3672 ham • Test word is “meeting”…that is, your goal is label a email with word “meeting” as spam or ham (not spam) • Run an simple shell script and find out that 16 “meeting”s in spam, 153 “meetings” in ham • Right away what is your intuition? Now prove it using Bayes
  • 9. Calculations • P(spam) = 1500/(1500+3672) = 0.29 • P(ham) = 0.71 • P(meeting|spam) = 16/1500= 0.0106 • P(meeting|ham) = 15/3672 = 0.0416 • P(meeting) = P(meeting|spam)P(spam) + P(meeting|ham)P(ham) = 0.0106 *0.29 + 0.0416+0.71= 0.03261 • P(spam|meeting) = P(meeting|spam)*P(spam)/P(meeting) = 0.0106*0.29/0.03261 = 0.094  9.4%
  • 10. Simulation using bash shell script • On to demo • This code is available in pages 105-106 … good luck with the typos… figure it out
  • 11. A spam that combines words: Naïve Bayes • Lets transform one word algorithm to a model that considers all words… • Form an bit vector for words with each email: X with xj is 1 if the word is present, 0 if the word is absent in the email • Let c denote it is spam • Then 𝑃 𝑥 𝑐 = 𝑗(∅ 𝑗𝑐)xj (1 - ∅ 𝑗𝑐) (1-xj) • Lets understand this with an example..and also turn product into summation..by using log..
  • 12. Multi-word (contd.) • … • log(p(x|c)) = 𝑗 𝑋𝑗 𝑊𝑗 + 𝑤0 • The x weights vary with email… can we compute using MR? • Once you know P(x|c), we can estimate P(c|x) using Bayes Rule (P(c), and P(x) can be computed as before); we can also use MR for P(x) computation for various words (KEY)
  • 13. Wrangling • Rest of the chapter deals with wrangling of data • Very important… what we are doing now with project 1 and project 2 • Connect to an API and extract data • The DDS chapter 4 shows an example with NYT data and classifies the articles.
  • 14. Summary • Learn Naïve Bayes Rule • Application to spam filtering in emails • Work the example/understand the example discussed in class: disease one, a spam filter.. • Possible question problem statement  classification model using Naïve Bayes