SlideShare a Scribd company logo
Machine Learning
k-Nearest Neighbor Classifiers
1-Nearest Neighbor Classifier
Test Examples
(What class to assign this?)
1-Nearest Neighbor
2-Nearest Neighbor
3-Nearest Neighbor
8-Nearest Neighbor
Controlling COMPLEXITY in k-NN
Ingredient Sweetness Crunchiness Food type
apple 10 9 fruit
Bacon 1 4 protein
banana 10 1 fruit
carrot 7 10 vegetable
celery 3 10 vegetable
cheese 1 1 protein
Measuring similarity with distance
Locating the tomato's nearest neighbors requires a distance function, or a
formula that measures the similarity between the two instances.
There are many different ways to calculate distance. Traditionally, the k-NN
algorithm uses Euclidean distance, which is the distance one would measure
if it were possible to use a ruler to connect two points, illustrated in the previous
figure by the dotted lines connecting the tomato to its neighbors.
Euclidean distance
Euclidean distance is specified by the following formula, where p and q are the
examples to be compared, each having n features. The term p1 refers to the value
of the first feature of example p, while q1 refers to the value of the first feature of
example q:
Application of KNN
Which Class Tomoto belongs to given the feature values:
Tomato (sweetness = 6, crunchiness = 4),
K = 3, 5, 7, 9
K = 11,13,15,17
Bayesian Classifiers
Understanding probability
The probability of an event is estimated from the observed data by dividing
the number of trials in which the event occurred by the total number of
trials
For instance, if it rained 3 out of 10 days with similar conditions as
today, the probability of rain today can be estimated as 3 / 10 = 0.30 or
30 percent.
Similarly, if 10 out of 50 prior email messages were spam, then the
probability of any incoming message being spam can be estimated as
10 / 50 = 0.20 or 20 percent.
Note: The probability of all the possible outcomes of a trial must always sum to 1
For example, given the value P(spam) = 0.20, we can calculate P(ham)
= 1 – 0.20 = 0.80
For example, given the value P(spam) = 0.20, we can calculate P(ham)
= 1 – 0.20 = 0.80
Understanding probability cont..
Because an event cannot simultaneously happen and not happen, an event is
always mutually exclusive and exhaustive with its complement
The complement of event A is typically denoted Ac or A'.
Additionally, the shorthand notation P(¬A) can used to denote the probability of
event A not occurring, as in P(¬spam) = 0.80. This notation is equivalent to
P(Ac).
Understanding joint probability
Often, we are interested in monitoring several nonmutually exclusive events for
the same trial
Spam
20%
Lotter
y 5%
Ham
80%
All emails
Lottery
without
appearing
in Spam
Lottery
appearing
in Ham
Lottery appearing in
Spam
Understanding joint probability
Estimate the probability that both P(spam) and P(Spam) occur, which can be written as P(spam ∩ Lottery).
the notation A ∩ B refers to the event in which both A and B occur.
Calculating P(spam ∩ Lottery) depends on the joint probability of the two events or
how the probability of one event is related to the probability of the other.
If the two events are totally unrelated, they are called independent events
If P(spam) and P(Lottery) were independent, we could easily calculate
P(spam ∩ Lottery), the probability of both events happening at the same time.
Because 20 percent of all the messages are spam, and 5 percent of all the e-
mails contain the word Lottery, we could assume that 1 percent of all
messages are spam with the term Lottery.
More generally, for independent events A and B, the probability of both
happening can be expressed as P(A ∩ B) = P(A) * P(B).
0.05 * 0.20 = 0.01
Bayes Rule
 Bayes Rule: The most important Equation in ML!!
P Class Data( )=
P Class( )P Data Class( )
P Data( )
Posterior Probability
(Probability of class AFTER seeing the data)
Class Prior Data Likelihood given Class
Data Prior (Marginal)
Naïve Bayes Classifier
Conditional Independence
 Simple Independence between two variables:
 Class Conditional Independence assumption:
P X1,X2( )= P X1( )P X2( )
P X1, X2( )¹ P X1( )P X2( )
P X1, X2 C( )= P X1 C( )P X2 C( )
P Fever, BodyAche( )¹ P Fever( )P BodyAche( )
P Fever, BodyAche Viral( )= P Fever Viral( )P BodyAche Viral( )
Naïve Bayes Classifier
Conditional Independence among variables given Classes!
 Simplifying assumption
 Baseline model especially when large number of features
 Taking log and ignoring denominator:
P C X1, X2,..., XD( )=
P C( )P X1, X2,..., XD C( )
P X1, X2,..., XD C'( )
¢C
å
=
P C( ) P Xd C( )
d=1
D
Õ
P Xd ¢C( )
d=1
D
Õ
¢C
å
log P C X1, X2,..., XD( )( )µlog P C( )( )+ log
d=1
D
å P Xd C( )( )
Naïve Bayes Classifier for
Categorical Valued Variables
Let’s Naïve Bayes!
#EXMPL
S
COLOR SHAPE
LIK
E
20 Red Square Y
10 Red Circle Y
10 Red Triangle N
10 Green Square N
5 Green Circle Y
5 Green Triangle N
10 Blue Square N
10 Blue Circle N
20 Blue Triangle Y
log P C X1, X2,..., XD( )( )µlog P C( )( )+ log
d=1
D
å P Xd C( )( )
Class Prior Parameters:
P Like = Y( )= ???
P Like = N( ) = ???
Class Conditional Likelihoods
P Color = Red Like = Y( )= ????
P Color = Red Like = N( )= ????
...
P Shape = Triangle Like = N( )= ????
Naïve Bayes Classifier for
Text Classifier
Text Classification Example
 Doc1 = {buy two shirts get one shirt half off}
 Doc2 = {get a free watch. send your contact details now}
 Doc3 = {your flight to chennai is delayed by two hours}
 Doc4 = {you have three tweets from @sachin}
Four Class Problem:
 Spam,
 Promotions,
 Social,
 Main
P promo doc1( )= 0.84
P spam doc2( )= 0.94
P main doc3( )= 0.75
P social doc4( )= 0.91
Bag-of-Words Representation
 Structured (e.g. Multivariate) data – fixed number of features
 Unstructured (e.g. Text) data
 arbitrary length documents,
 high dimensional feature space (many words in vocabulary),
 Sparse (small fraction of vocabulary words present in a doc.)
 Bag-of-Words Representation:
 Ignore Sequential order of words
 Represent as a Weighted-Set – Term Frequency of each term
Naïve Bayes Classifier with BoW
 Make an “independence assumption” about words | class
P doc1 promo( )
= P buy :1, two :1, shirt :2, get :1, one :1, half :1, off :1 promo( )
= P buy :1 promo( )P two :1 promo( )P shirt :2 promo( )
P get :1 promo( )P one :1 promo( )P free:1 promo( )
= P buy promo( )
1
P two promo( )
1
P shirt promo( )
2
P get promo( )
1
P one promo( )
1
P free promo( )
1
Naïve Bayes Text Classifiers
 Log Likelihood of document given class.
 Parameters in Naïve Bayes Text classifiers:
doc = tf wm( ){ }m=1
M
tf wm( )= Number of times word wm occurs in doc
P doc class( )= P w1 class( )
tf w1( )
P w2 class( )
tf w2( )
...P wM class( )
tf wM( )
P wm c( )= Probability that word wm occurs in documents of class c
P shirt promo( ),P free spam( ),P buy spam( ),P buy promo( ),...
Number of parameters = ??
doc1
doc2
...
docm
doc1
doc2
doc3
...
...
docn
 Likelihood of a word given class. For each word, each class.
 Estimating these parameters from data:
Naïve Bayes Parameters
P wm c( )= Probability that word wm occurs in documents of class c
N wm,c( )= Number of times word wm occurs in documents of class c
N free, promo( )= tf free doc( )
docÎpromo
å
N free,spam( )= tf free doc( )
docÎspam
å
Bayesian Classifier
Multi-variate real-valued data
Bayes Rule
P Class Data( )=
P Class( )P Data Class( )
P Data( )
Posterior Probability
(Probability of class AFTER seeing the data)
Class Prior Data Likelihood given Class
Data Prior (Marginal)
P c x( )=
P c( )P x c( )
P x( )
x ÎRD
Simple Bayesian Classifier
P c x( )=
P c( )P x c( )
P x( )
Sum: P x c( )dx
xÎRD
ò = 1
Mean: xP x c( )dx
xÎRD
ò = mc
Co-Variance: x - mc( ) x - mc( )T
P x c( )dx
xÎRD
ò = Sc
P x c( )= N x mc,Sc( )=
1
2p( )
D
2 Sc
1
2
exp -
1
2
x - mc( )T
Sc
-1
x - mc( )æ
èç
ö
ø÷
Controlling COMPLEXITY

More Related Content

What's hot

L07 msr
L07 msrL07 msr
L07 msr
Maruf Hamidi
 
CMSC 56 | Lecture 3: Predicates & Quantifiers
CMSC 56 | Lecture 3: Predicates & QuantifiersCMSC 56 | Lecture 3: Predicates & Quantifiers
CMSC 56 | Lecture 3: Predicates & Quantifiers
allyn joy calcaben
 
File ...handouts complete
File ...handouts completeFile ...handouts complete
File ...handouts complete
Raja Hussain
 
Sample Space And Events
Sample Space And EventsSample Space And Events
Sample Space And Events
DataminingTools Inc
 
42 linear equations
42 linear equations42 linear equations
42 linear equations
alg1testreview
 
Information theory
Information theoryInformation theory
Information theory
Madhumita Tamhane
 
3.1 higher derivatives
3.1 higher derivatives3.1 higher derivatives
3.1 higher derivatives
math265
 
Probability Basics and Bayes' Theorem
Probability Basics and Bayes' TheoremProbability Basics and Bayes' Theorem
Probability Basics and Bayes' Theorem
MenglinLiu1
 
Lecture 3 qualtifed rules of inference
Lecture 3 qualtifed rules of inferenceLecture 3 qualtifed rules of inference
Lecture 3 qualtifed rules of inference
asimnawaz54
 
Introduction to random variables
Introduction to random variablesIntroduction to random variables
Introduction to random variables
Hadley Wickham
 
4.5 calculation with log and exp
4.5 calculation with log and exp4.5 calculation with log and exp
4.5 calculation with log and exp
math260
 
4 3polynomial expressions
4 3polynomial expressions4 3polynomial expressions
4 3polynomial expressions
math123a
 
Bba i-bm-u-3.2-differentiation -
Bba i-bm-u-3.2-differentiation - Bba i-bm-u-3.2-differentiation -
Bba i-bm-u-3.2-differentiation -
Rai University
 
Appendex b
Appendex bAppendex b
Appendex b
swavicky
 
3 algebraic expressions y
3 algebraic expressions y3 algebraic expressions y
3 algebraic expressions y
math266
 
4 4polynomial operations
4 4polynomial operations4 4polynomial operations
4 4polynomial operations
math123a
 
Limits Involving Number e
Limits Involving Number eLimits Involving Number e
Limits Involving Number e
Pablo Antuna
 

What's hot (17)

L07 msr
L07 msrL07 msr
L07 msr
 
CMSC 56 | Lecture 3: Predicates & Quantifiers
CMSC 56 | Lecture 3: Predicates & QuantifiersCMSC 56 | Lecture 3: Predicates & Quantifiers
CMSC 56 | Lecture 3: Predicates & Quantifiers
 
File ...handouts complete
File ...handouts completeFile ...handouts complete
File ...handouts complete
 
Sample Space And Events
Sample Space And EventsSample Space And Events
Sample Space And Events
 
42 linear equations
42 linear equations42 linear equations
42 linear equations
 
Information theory
Information theoryInformation theory
Information theory
 
3.1 higher derivatives
3.1 higher derivatives3.1 higher derivatives
3.1 higher derivatives
 
Probability Basics and Bayes' Theorem
Probability Basics and Bayes' TheoremProbability Basics and Bayes' Theorem
Probability Basics and Bayes' Theorem
 
Lecture 3 qualtifed rules of inference
Lecture 3 qualtifed rules of inferenceLecture 3 qualtifed rules of inference
Lecture 3 qualtifed rules of inference
 
Introduction to random variables
Introduction to random variablesIntroduction to random variables
Introduction to random variables
 
4.5 calculation with log and exp
4.5 calculation with log and exp4.5 calculation with log and exp
4.5 calculation with log and exp
 
4 3polynomial expressions
4 3polynomial expressions4 3polynomial expressions
4 3polynomial expressions
 
Bba i-bm-u-3.2-differentiation -
Bba i-bm-u-3.2-differentiation - Bba i-bm-u-3.2-differentiation -
Bba i-bm-u-3.2-differentiation -
 
Appendex b
Appendex bAppendex b
Appendex b
 
3 algebraic expressions y
3 algebraic expressions y3 algebraic expressions y
3 algebraic expressions y
 
4 4polynomial operations
4 4polynomial operations4 4polynomial operations
4 4polynomial operations
 
Limits Involving Number e
Limits Involving Number eLimits Involving Number e
Limits Involving Number e
 

Similar to best data science courses

Basic Concept Of Probability
Basic Concept Of ProbabilityBasic Concept Of Probability
Basic Concept Of Probability
guest45a926
 
Probability Cheatsheet.pdf
Probability Cheatsheet.pdfProbability Cheatsheet.pdf
Probability Cheatsheet.pdf
ChinmayeeJonnalagadd2
 
Probability cheatsheet
Probability cheatsheetProbability cheatsheet
Probability cheatsheet
Suvrat Mishra
 
Probability cheatsheet
Probability cheatsheetProbability cheatsheet
Probability cheatsheet
Joachim Gwoke
 
Microeconomics-Help-Experts.pptx
Microeconomics-Help-Experts.pptxMicroeconomics-Help-Experts.pptx
Microeconomics-Help-Experts.pptx
Economics Homework Helper
 
Chapter 6 Probability
Chapter 6  ProbabilityChapter 6  Probability
Applied machine learning for search engine relevance 3
Applied machine learning for search engine relevance 3Applied machine learning for search engine relevance 3
Applied machine learning for search engine relevance 3
Charles Martin
 
Information Theory and coding - Lecture 1
Information Theory and coding - Lecture 1Information Theory and coding - Lecture 1
Information Theory and coding - Lecture 1
Aref35
 
Bayes Classification
Bayes ClassificationBayes Classification
Bayes Classification
sathish sak
 
Data mining assignment 2
Data mining assignment 2Data mining assignment 2
Data mining assignment 2
BarryK88
 
Unit-2 Bayes Decision Theory.pptx
Unit-2 Bayes Decision Theory.pptxUnit-2 Bayes Decision Theory.pptx
Unit-2 Bayes Decision Theory.pptx
avinashBajpayee1
 
M03 nb-02
M03 nb-02M03 nb-02
M03 nb-02
Raman Kannan
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
butest
 
Probability distributions
Probability distributions  Probability distributions
Probability distributions
Long Beach City College
 
pattern recognition
pattern recognition pattern recognition
pattern recognition
MohammadMoattar2
 
Naive Bayes Presentation
Naive Bayes PresentationNaive Bayes Presentation
Naive Bayes Presentation
Md. Enamul Haque Chowdhury
 
Equational axioms for probability calculus and modelling of Likelihood ratio ...
Equational axioms for probability calculus and modelling of Likelihood ratio ...Equational axioms for probability calculus and modelling of Likelihood ratio ...
Equational axioms for probability calculus and modelling of Likelihood ratio ...
Advanced-Concepts-Team
 
Bayesian statistics
Bayesian statisticsBayesian statistics
Bayesian statistics
Alberto Labarga
 
Ning_Mei.ASSIGN01
Ning_Mei.ASSIGN01Ning_Mei.ASSIGN01
Ning_Mei.ASSIGN01
宁 梅
 
PTSP PPT.pdf
PTSP PPT.pdfPTSP PPT.pdf
PTSP PPT.pdf
goutamkrsahoo
 

Similar to best data science courses (20)

Basic Concept Of Probability
Basic Concept Of ProbabilityBasic Concept Of Probability
Basic Concept Of Probability
 
Probability Cheatsheet.pdf
Probability Cheatsheet.pdfProbability Cheatsheet.pdf
Probability Cheatsheet.pdf
 
Probability cheatsheet
Probability cheatsheetProbability cheatsheet
Probability cheatsheet
 
Probability cheatsheet
Probability cheatsheetProbability cheatsheet
Probability cheatsheet
 
Microeconomics-Help-Experts.pptx
Microeconomics-Help-Experts.pptxMicroeconomics-Help-Experts.pptx
Microeconomics-Help-Experts.pptx
 
Chapter 6 Probability
Chapter 6  ProbabilityChapter 6  Probability
Chapter 6 Probability
 
Applied machine learning for search engine relevance 3
Applied machine learning for search engine relevance 3Applied machine learning for search engine relevance 3
Applied machine learning for search engine relevance 3
 
Information Theory and coding - Lecture 1
Information Theory and coding - Lecture 1Information Theory and coding - Lecture 1
Information Theory and coding - Lecture 1
 
Bayes Classification
Bayes ClassificationBayes Classification
Bayes Classification
 
Data mining assignment 2
Data mining assignment 2Data mining assignment 2
Data mining assignment 2
 
Unit-2 Bayes Decision Theory.pptx
Unit-2 Bayes Decision Theory.pptxUnit-2 Bayes Decision Theory.pptx
Unit-2 Bayes Decision Theory.pptx
 
M03 nb-02
M03 nb-02M03 nb-02
M03 nb-02
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Probability distributions
Probability distributions  Probability distributions
Probability distributions
 
pattern recognition
pattern recognition pattern recognition
pattern recognition
 
Naive Bayes Presentation
Naive Bayes PresentationNaive Bayes Presentation
Naive Bayes Presentation
 
Equational axioms for probability calculus and modelling of Likelihood ratio ...
Equational axioms for probability calculus and modelling of Likelihood ratio ...Equational axioms for probability calculus and modelling of Likelihood ratio ...
Equational axioms for probability calculus and modelling of Likelihood ratio ...
 
Bayesian statistics
Bayesian statisticsBayesian statistics
Bayesian statistics
 
Ning_Mei.ASSIGN01
Ning_Mei.ASSIGN01Ning_Mei.ASSIGN01
Ning_Mei.ASSIGN01
 
PTSP PPT.pdf
PTSP PPT.pdfPTSP PPT.pdf
PTSP PPT.pdf
 

More from marketer1234

rrb ntpc previous papers
rrb ntpc previous papersrrb ntpc previous papers
rrb ntpc previous papers
marketer1234
 
rrb ntpc previous papers
rrb ntpc previous papersrrb ntpc previous papers
rrb ntpc previous papers
marketer1234
 
data science course
data science coursedata science course
data science course
marketer1234
 
Service now adminstration & developer (1)
Service now adminstration & developer (1)Service now adminstration & developer (1)
Service now adminstration & developer (1)
marketer1234
 
itil certification in Bangalore
itil certification in Bangaloreitil certification in Bangalore
itil certification in Bangalore
marketer1234
 
itil certification in Bangalore
itil certification in Bangaloreitil certification in Bangalore
itil certification in Bangalore
marketer1234
 
itil certification in Bangalore
itil certification in Bangaloreitil certification in Bangalore
itil certification in Bangalore
marketer1234
 
data science course in pune
data science course in punedata science course in pune
data science course in pune
marketer1234
 
iot certification course training Bangalore
iot certification course training Bangaloreiot certification course training Bangalore
iot certification course training Bangalore
marketer1234
 
iot certification course training Bangalore
iot certification course training Bangaloreiot certification course training Bangalore
iot certification course training Bangalore
marketer1234
 
pmp certification institutes in Hyderabad
pmp certification institutes in Hyderabadpmp certification institutes in Hyderabad
pmp certification institutes in Hyderabad
marketer1234
 
pmp certification institutes in Hyderabad
pmp certification institutes in Hyderabadpmp certification institutes in Hyderabad
pmp certification institutes in Hyderabad
marketer1234
 
pmp courses in Bangalore
pmp courses in Bangalorepmp courses in Bangalore
pmp courses in Bangalore
marketer1234
 

More from marketer1234 (13)

rrb ntpc previous papers
rrb ntpc previous papersrrb ntpc previous papers
rrb ntpc previous papers
 
rrb ntpc previous papers
rrb ntpc previous papersrrb ntpc previous papers
rrb ntpc previous papers
 
data science course
data science coursedata science course
data science course
 
Service now adminstration & developer (1)
Service now adminstration & developer (1)Service now adminstration & developer (1)
Service now adminstration & developer (1)
 
itil certification in Bangalore
itil certification in Bangaloreitil certification in Bangalore
itil certification in Bangalore
 
itil certification in Bangalore
itil certification in Bangaloreitil certification in Bangalore
itil certification in Bangalore
 
itil certification in Bangalore
itil certification in Bangaloreitil certification in Bangalore
itil certification in Bangalore
 
data science course in pune
data science course in punedata science course in pune
data science course in pune
 
iot certification course training Bangalore
iot certification course training Bangaloreiot certification course training Bangalore
iot certification course training Bangalore
 
iot certification course training Bangalore
iot certification course training Bangaloreiot certification course training Bangalore
iot certification course training Bangalore
 
pmp certification institutes in Hyderabad
pmp certification institutes in Hyderabadpmp certification institutes in Hyderabad
pmp certification institutes in Hyderabad
 
pmp certification institutes in Hyderabad
pmp certification institutes in Hyderabadpmp certification institutes in Hyderabad
pmp certification institutes in Hyderabad
 
pmp courses in Bangalore
pmp courses in Bangalorepmp courses in Bangalore
pmp courses in Bangalore
 

Recently uploaded

Cover Story - China's Investment Leader - Dr. Alyce SU
Cover Story - China's Investment Leader - Dr. Alyce SUCover Story - China's Investment Leader - Dr. Alyce SU
Cover Story - China's Investment Leader - Dr. Alyce SU
msthrill
 
Best practices for project execution and delivery
Best practices for project execution and deliveryBest practices for project execution and delivery
Best practices for project execution and delivery
CLIVE MINCHIN
 
Chapter 7 Final business management sciences .ppt
Chapter 7 Final business management sciences .pptChapter 7 Final business management sciences .ppt
Chapter 7 Final business management sciences .ppt
ssuser567e2d
 
Dpboss Matka Guessing Satta Matta Matka Kalyan Chart Satta Matka
Dpboss Matka Guessing Satta Matta Matka Kalyan Chart Satta MatkaDpboss Matka Guessing Satta Matta Matka Kalyan Chart Satta Matka
Dpboss Matka Guessing Satta Matta Matka Kalyan Chart Satta Matka
➒➌➎➏➑➐➋➑➐➐Dpboss Matka Guessing Satta Matka Kalyan Chart Indian Matka
 
Call8328958814 satta matka Kalyan result satta guessing
Call8328958814 satta matka Kalyan result satta guessingCall8328958814 satta matka Kalyan result satta guessing
Call8328958814 satta matka Kalyan result satta guessing
➑➌➋➑➒➎➑➑➊➍
 
Innovation Management Frameworks: Your Guide to Creativity & Innovation
Innovation Management Frameworks: Your Guide to Creativity & InnovationInnovation Management Frameworks: Your Guide to Creativity & Innovation
Innovation Management Frameworks: Your Guide to Creativity & Innovation
Operational Excellence Consulting
 
4 Benefits of Partnering with an OnlyFans Agency for Content Creators.pdf
4 Benefits of Partnering with an OnlyFans Agency for Content Creators.pdf4 Benefits of Partnering with an OnlyFans Agency for Content Creators.pdf
4 Benefits of Partnering with an OnlyFans Agency for Content Creators.pdf
onlyfansmanagedau
 
Registered-Establishment-List-in-Uttarakhand-pdf.pdf
Registered-Establishment-List-in-Uttarakhand-pdf.pdfRegistered-Establishment-List-in-Uttarakhand-pdf.pdf
Registered-Establishment-List-in-Uttarakhand-pdf.pdf
dazzjoker
 
Digital Transformation Frameworks: Driving Digital Excellence
Digital Transformation Frameworks: Driving Digital ExcellenceDigital Transformation Frameworks: Driving Digital Excellence
Digital Transformation Frameworks: Driving Digital Excellence
Operational Excellence Consulting
 
Presentation by Herman Kienhuis (Curiosity VC) on Investing in AI for ABS Alu...
Presentation by Herman Kienhuis (Curiosity VC) on Investing in AI for ABS Alu...Presentation by Herman Kienhuis (Curiosity VC) on Investing in AI for ABS Alu...
Presentation by Herman Kienhuis (Curiosity VC) on Investing in AI for ABS Alu...
Herman Kienhuis
 
The APCO Geopolitical Radar - Q3 2024 The Global Operating Environment for Bu...
The APCO Geopolitical Radar - Q3 2024 The Global Operating Environment for Bu...The APCO Geopolitical Radar - Q3 2024 The Global Operating Environment for Bu...
The APCO Geopolitical Radar - Q3 2024 The Global Operating Environment for Bu...
APCO
 
Income Tax exemption for Start up : Section 80 IAC
Income Tax  exemption for Start up : Section 80 IACIncome Tax  exemption for Start up : Section 80 IAC
Income Tax exemption for Start up : Section 80 IAC
CA Dr. Prithvi Ranjan Parhi
 
The Genesis of BriansClub.cm Famous Dark WEb Platform
The Genesis of BriansClub.cm Famous Dark WEb PlatformThe Genesis of BriansClub.cm Famous Dark WEb Platform
The Genesis of BriansClub.cm Famous Dark WEb Platform
SabaaSudozai
 
Unveiling the Dynamic Personalities, Key Dates, and Horoscope Insights: Gemin...
Unveiling the Dynamic Personalities, Key Dates, and Horoscope Insights: Gemin...Unveiling the Dynamic Personalities, Key Dates, and Horoscope Insights: Gemin...
Unveiling the Dynamic Personalities, Key Dates, and Horoscope Insights: Gemin...
my Pandit
 
Lundin Gold Corporate Presentation - June 2024
Lundin Gold Corporate Presentation - June 2024Lundin Gold Corporate Presentation - June 2024
Lundin Gold Corporate Presentation - June 2024
Adnet Communications
 
Pitch Deck Teardown: Kinnect's $250k Angel deck
Pitch Deck Teardown: Kinnect's $250k Angel deckPitch Deck Teardown: Kinnect's $250k Angel deck
Pitch Deck Teardown: Kinnect's $250k Angel deck
HajeJanKamps
 
Innovative Uses of Revit in Urban Planning and Design
Innovative Uses of Revit in Urban Planning and DesignInnovative Uses of Revit in Urban Planning and Design
Innovative Uses of Revit in Urban Planning and Design
Chandresh Chudasama
 
Best Competitive Marble Pricing in Dubai - ☎ 9928909666
Best Competitive Marble Pricing in Dubai - ☎ 9928909666Best Competitive Marble Pricing in Dubai - ☎ 9928909666
Best Competitive Marble Pricing in Dubai - ☎ 9928909666
Stone Art Hub
 
NIMA2024 | De toegevoegde waarde van DEI en ESG in campagnes | Nathalie Lam |...
NIMA2024 | De toegevoegde waarde van DEI en ESG in campagnes | Nathalie Lam |...NIMA2024 | De toegevoegde waarde van DEI en ESG in campagnes | Nathalie Lam |...
NIMA2024 | De toegevoegde waarde van DEI en ESG in campagnes | Nathalie Lam |...
BBPMedia1
 
Part 2 Deep Dive: Navigating the 2024 Slowdown
Part 2 Deep Dive: Navigating the 2024 SlowdownPart 2 Deep Dive: Navigating the 2024 Slowdown
Part 2 Deep Dive: Navigating the 2024 Slowdown
jeffkluth1
 

Recently uploaded (20)

Cover Story - China's Investment Leader - Dr. Alyce SU
Cover Story - China's Investment Leader - Dr. Alyce SUCover Story - China's Investment Leader - Dr. Alyce SU
Cover Story - China's Investment Leader - Dr. Alyce SU
 
Best practices for project execution and delivery
Best practices for project execution and deliveryBest practices for project execution and delivery
Best practices for project execution and delivery
 
Chapter 7 Final business management sciences .ppt
Chapter 7 Final business management sciences .pptChapter 7 Final business management sciences .ppt
Chapter 7 Final business management sciences .ppt
 
Dpboss Matka Guessing Satta Matta Matka Kalyan Chart Satta Matka
Dpboss Matka Guessing Satta Matta Matka Kalyan Chart Satta MatkaDpboss Matka Guessing Satta Matta Matka Kalyan Chart Satta Matka
Dpboss Matka Guessing Satta Matta Matka Kalyan Chart Satta Matka
 
Call8328958814 satta matka Kalyan result satta guessing
Call8328958814 satta matka Kalyan result satta guessingCall8328958814 satta matka Kalyan result satta guessing
Call8328958814 satta matka Kalyan result satta guessing
 
Innovation Management Frameworks: Your Guide to Creativity & Innovation
Innovation Management Frameworks: Your Guide to Creativity & InnovationInnovation Management Frameworks: Your Guide to Creativity & Innovation
Innovation Management Frameworks: Your Guide to Creativity & Innovation
 
4 Benefits of Partnering with an OnlyFans Agency for Content Creators.pdf
4 Benefits of Partnering with an OnlyFans Agency for Content Creators.pdf4 Benefits of Partnering with an OnlyFans Agency for Content Creators.pdf
4 Benefits of Partnering with an OnlyFans Agency for Content Creators.pdf
 
Registered-Establishment-List-in-Uttarakhand-pdf.pdf
Registered-Establishment-List-in-Uttarakhand-pdf.pdfRegistered-Establishment-List-in-Uttarakhand-pdf.pdf
Registered-Establishment-List-in-Uttarakhand-pdf.pdf
 
Digital Transformation Frameworks: Driving Digital Excellence
Digital Transformation Frameworks: Driving Digital ExcellenceDigital Transformation Frameworks: Driving Digital Excellence
Digital Transformation Frameworks: Driving Digital Excellence
 
Presentation by Herman Kienhuis (Curiosity VC) on Investing in AI for ABS Alu...
Presentation by Herman Kienhuis (Curiosity VC) on Investing in AI for ABS Alu...Presentation by Herman Kienhuis (Curiosity VC) on Investing in AI for ABS Alu...
Presentation by Herman Kienhuis (Curiosity VC) on Investing in AI for ABS Alu...
 
The APCO Geopolitical Radar - Q3 2024 The Global Operating Environment for Bu...
The APCO Geopolitical Radar - Q3 2024 The Global Operating Environment for Bu...The APCO Geopolitical Radar - Q3 2024 The Global Operating Environment for Bu...
The APCO Geopolitical Radar - Q3 2024 The Global Operating Environment for Bu...
 
Income Tax exemption for Start up : Section 80 IAC
Income Tax  exemption for Start up : Section 80 IACIncome Tax  exemption for Start up : Section 80 IAC
Income Tax exemption for Start up : Section 80 IAC
 
The Genesis of BriansClub.cm Famous Dark WEb Platform
The Genesis of BriansClub.cm Famous Dark WEb PlatformThe Genesis of BriansClub.cm Famous Dark WEb Platform
The Genesis of BriansClub.cm Famous Dark WEb Platform
 
Unveiling the Dynamic Personalities, Key Dates, and Horoscope Insights: Gemin...
Unveiling the Dynamic Personalities, Key Dates, and Horoscope Insights: Gemin...Unveiling the Dynamic Personalities, Key Dates, and Horoscope Insights: Gemin...
Unveiling the Dynamic Personalities, Key Dates, and Horoscope Insights: Gemin...
 
Lundin Gold Corporate Presentation - June 2024
Lundin Gold Corporate Presentation - June 2024Lundin Gold Corporate Presentation - June 2024
Lundin Gold Corporate Presentation - June 2024
 
Pitch Deck Teardown: Kinnect's $250k Angel deck
Pitch Deck Teardown: Kinnect's $250k Angel deckPitch Deck Teardown: Kinnect's $250k Angel deck
Pitch Deck Teardown: Kinnect's $250k Angel deck
 
Innovative Uses of Revit in Urban Planning and Design
Innovative Uses of Revit in Urban Planning and DesignInnovative Uses of Revit in Urban Planning and Design
Innovative Uses of Revit in Urban Planning and Design
 
Best Competitive Marble Pricing in Dubai - ☎ 9928909666
Best Competitive Marble Pricing in Dubai - ☎ 9928909666Best Competitive Marble Pricing in Dubai - ☎ 9928909666
Best Competitive Marble Pricing in Dubai - ☎ 9928909666
 
NIMA2024 | De toegevoegde waarde van DEI en ESG in campagnes | Nathalie Lam |...
NIMA2024 | De toegevoegde waarde van DEI en ESG in campagnes | Nathalie Lam |...NIMA2024 | De toegevoegde waarde van DEI en ESG in campagnes | Nathalie Lam |...
NIMA2024 | De toegevoegde waarde van DEI en ESG in campagnes | Nathalie Lam |...
 
Part 2 Deep Dive: Navigating the 2024 Slowdown
Part 2 Deep Dive: Navigating the 2024 SlowdownPart 2 Deep Dive: Navigating the 2024 Slowdown
Part 2 Deep Dive: Navigating the 2024 Slowdown
 

best data science courses

  • 3. 1-Nearest Neighbor Classifier Test Examples (What class to assign this?)
  • 9. Ingredient Sweetness Crunchiness Food type apple 10 9 fruit Bacon 1 4 protein banana 10 1 fruit carrot 7 10 vegetable celery 3 10 vegetable cheese 1 1 protein
  • 10.
  • 11.
  • 12.
  • 13. Measuring similarity with distance Locating the tomato's nearest neighbors requires a distance function, or a formula that measures the similarity between the two instances. There are many different ways to calculate distance. Traditionally, the k-NN algorithm uses Euclidean distance, which is the distance one would measure if it were possible to use a ruler to connect two points, illustrated in the previous figure by the dotted lines connecting the tomato to its neighbors.
  • 14. Euclidean distance Euclidean distance is specified by the following formula, where p and q are the examples to be compared, each having n features. The term p1 refers to the value of the first feature of example p, while q1 refers to the value of the first feature of example q:
  • 15. Application of KNN Which Class Tomoto belongs to given the feature values: Tomato (sweetness = 6, crunchiness = 4),
  • 16. K = 3, 5, 7, 9
  • 19. Understanding probability The probability of an event is estimated from the observed data by dividing the number of trials in which the event occurred by the total number of trials For instance, if it rained 3 out of 10 days with similar conditions as today, the probability of rain today can be estimated as 3 / 10 = 0.30 or 30 percent. Similarly, if 10 out of 50 prior email messages were spam, then the probability of any incoming message being spam can be estimated as 10 / 50 = 0.20 or 20 percent. Note: The probability of all the possible outcomes of a trial must always sum to 1 For example, given the value P(spam) = 0.20, we can calculate P(ham) = 1 – 0.20 = 0.80
  • 20. For example, given the value P(spam) = 0.20, we can calculate P(ham) = 1 – 0.20 = 0.80 Understanding probability cont.. Because an event cannot simultaneously happen and not happen, an event is always mutually exclusive and exhaustive with its complement The complement of event A is typically denoted Ac or A'. Additionally, the shorthand notation P(¬A) can used to denote the probability of event A not occurring, as in P(¬spam) = 0.80. This notation is equivalent to P(Ac).
  • 21. Understanding joint probability Often, we are interested in monitoring several nonmutually exclusive events for the same trial Spam 20% Lotter y 5% Ham 80% All emails
  • 22. Lottery without appearing in Spam Lottery appearing in Ham Lottery appearing in Spam Understanding joint probability Estimate the probability that both P(spam) and P(Spam) occur, which can be written as P(spam ∩ Lottery). the notation A ∩ B refers to the event in which both A and B occur.
  • 23. Calculating P(spam ∩ Lottery) depends on the joint probability of the two events or how the probability of one event is related to the probability of the other. If the two events are totally unrelated, they are called independent events If P(spam) and P(Lottery) were independent, we could easily calculate P(spam ∩ Lottery), the probability of both events happening at the same time. Because 20 percent of all the messages are spam, and 5 percent of all the e- mails contain the word Lottery, we could assume that 1 percent of all messages are spam with the term Lottery. More generally, for independent events A and B, the probability of both happening can be expressed as P(A ∩ B) = P(A) * P(B). 0.05 * 0.20 = 0.01
  • 24. Bayes Rule  Bayes Rule: The most important Equation in ML!! P Class Data( )= P Class( )P Data Class( ) P Data( ) Posterior Probability (Probability of class AFTER seeing the data) Class Prior Data Likelihood given Class Data Prior (Marginal)
  • 26. Conditional Independence  Simple Independence between two variables:  Class Conditional Independence assumption: P X1,X2( )= P X1( )P X2( ) P X1, X2( )¹ P X1( )P X2( ) P X1, X2 C( )= P X1 C( )P X2 C( ) P Fever, BodyAche( )¹ P Fever( )P BodyAche( ) P Fever, BodyAche Viral( )= P Fever Viral( )P BodyAche Viral( )
  • 27. Naïve Bayes Classifier Conditional Independence among variables given Classes!  Simplifying assumption  Baseline model especially when large number of features  Taking log and ignoring denominator: P C X1, X2,..., XD( )= P C( )P X1, X2,..., XD C( ) P X1, X2,..., XD C'( ) ¢C å = P C( ) P Xd C( ) d=1 D Õ P Xd ¢C( ) d=1 D Õ ¢C å log P C X1, X2,..., XD( )( )µlog P C( )( )+ log d=1 D å P Xd C( )( )
  • 28. Naïve Bayes Classifier for Categorical Valued Variables
  • 29. Let’s Naïve Bayes! #EXMPL S COLOR SHAPE LIK E 20 Red Square Y 10 Red Circle Y 10 Red Triangle N 10 Green Square N 5 Green Circle Y 5 Green Triangle N 10 Blue Square N 10 Blue Circle N 20 Blue Triangle Y log P C X1, X2,..., XD( )( )µlog P C( )( )+ log d=1 D å P Xd C( )( ) Class Prior Parameters: P Like = Y( )= ??? P Like = N( ) = ??? Class Conditional Likelihoods P Color = Red Like = Y( )= ???? P Color = Red Like = N( )= ???? ... P Shape = Triangle Like = N( )= ????
  • 30. Naïve Bayes Classifier for Text Classifier
  • 31. Text Classification Example  Doc1 = {buy two shirts get one shirt half off}  Doc2 = {get a free watch. send your contact details now}  Doc3 = {your flight to chennai is delayed by two hours}  Doc4 = {you have three tweets from @sachin} Four Class Problem:  Spam,  Promotions,  Social,  Main P promo doc1( )= 0.84 P spam doc2( )= 0.94 P main doc3( )= 0.75 P social doc4( )= 0.91
  • 32. Bag-of-Words Representation  Structured (e.g. Multivariate) data – fixed number of features  Unstructured (e.g. Text) data  arbitrary length documents,  high dimensional feature space (many words in vocabulary),  Sparse (small fraction of vocabulary words present in a doc.)  Bag-of-Words Representation:  Ignore Sequential order of words  Represent as a Weighted-Set – Term Frequency of each term
  • 33. Naïve Bayes Classifier with BoW  Make an “independence assumption” about words | class P doc1 promo( ) = P buy :1, two :1, shirt :2, get :1, one :1, half :1, off :1 promo( ) = P buy :1 promo( )P two :1 promo( )P shirt :2 promo( ) P get :1 promo( )P one :1 promo( )P free:1 promo( ) = P buy promo( ) 1 P two promo( ) 1 P shirt promo( ) 2 P get promo( ) 1 P one promo( ) 1 P free promo( ) 1
  • 34. Naïve Bayes Text Classifiers  Log Likelihood of document given class.  Parameters in Naïve Bayes Text classifiers: doc = tf wm( ){ }m=1 M tf wm( )= Number of times word wm occurs in doc P doc class( )= P w1 class( ) tf w1( ) P w2 class( ) tf w2( ) ...P wM class( ) tf wM( ) P wm c( )= Probability that word wm occurs in documents of class c P shirt promo( ),P free spam( ),P buy spam( ),P buy promo( ),... Number of parameters = ??
  • 35. doc1 doc2 ... docm doc1 doc2 doc3 ... ... docn  Likelihood of a word given class. For each word, each class.  Estimating these parameters from data: Naïve Bayes Parameters P wm c( )= Probability that word wm occurs in documents of class c N wm,c( )= Number of times word wm occurs in documents of class c N free, promo( )= tf free doc( ) docÎpromo å N free,spam( )= tf free doc( ) docÎspam å
  • 37. Bayes Rule P Class Data( )= P Class( )P Data Class( ) P Data( ) Posterior Probability (Probability of class AFTER seeing the data) Class Prior Data Likelihood given Class Data Prior (Marginal) P c x( )= P c( )P x c( ) P x( ) x ÎRD
  • 38. Simple Bayesian Classifier P c x( )= P c( )P x c( ) P x( ) Sum: P x c( )dx xÎRD ò = 1 Mean: xP x c( )dx xÎRD ò = mc Co-Variance: x - mc( ) x - mc( )T P x c( )dx xÎRD ò = Sc P x c( )= N x mc,Sc( )= 1 2p( ) D 2 Sc 1 2 exp - 1 2 x - mc( )T Sc -1 x - mc( )æ èç ö ø÷