Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Bayes’ theorem and logistic regression
1.
2. Bayes’ theorem gives the relationship between the probabilities of A and B, P(A) and
P(B), and the conditional probabilities of A given B and B given A, P(A|B) and P(B|A), in
its most common form P(A|B)=
𝑃 𝐵 𝐴 𝑃(𝐴)
𝑃(𝐵)
In Bayesian interpretation, probability measures the degree of belief. Bayes theorem links
the belief in a proposition before and after accounting for an evidence.
For proposition A and evidence B:
P(A), the prior, is the initial degree of belief in A
P(A|B), the posterior, is the degree of belief having accounted for B
The quotient P(B|A)/P(B) represents the support B provides for A
3. The probability model for a classifier is a conditional model p(C|F1, …., Fn), over a
dependent class variable C, with a small number of outcomes or classes conditioned on
several feature variables F1 through Fn .
Problem – large number of features or features that can take large number of values
makes the probability tables infeasible
Using Bayesian theorem p(C|F1, …., Fn) =
𝒑 𝑪 𝒑(𝑭 𝟏
,…….𝑭𝒏|𝑪)
𝒑(𝑭 𝟏
,…….𝑭𝒏)
in plain English, posterior=
𝑝𝑟𝑖𝑜𝑟 ∗𝑙𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑
𝑒𝑣𝑖𝑑𝑒𝑛𝑐𝑒
4. Since the denominator is not dependent on C and the values of the features Fi are given,
so that the denominator is effectively constant. The numerator is equivalent to the joint
probability model p(C,F1, …., Fn ).
Using the chain rule for repeated applications of definition of conditional probability
Role of Naïve condition: assume that the feature Fi is conditionally independent of every
other feature Fj, for j≠i given the category C.
5. The joint model can be represented as
Under conditional distribution over the class variable
Where z=p(F1, ….., Fn) is a scaling factor
6. Naïve Bayes classifier combines this model with a decision rule.
Most common rule is to pick hypothesis that is most probable, known as maximum a
posteriori
The probability of a document F being in class c is computed as
P(F|c) is the conditional probability of term F occurring in a document of class c.
It is a measure of how much evidence F contributes that c is a correct class.
P(c) is the prior probability of a document occurring in class c.
7. Statistical classification model
Predicts binary response from a binary predictor for predicting the outcome of a
categorical dependent variable
Logistic regression measures the relationship between a categorical dependent variable
and one or more independent variable
Applications:
medical and social science field like Trauma and Injury Severity Score (TRISS), used to predict
mortality in injured patients
used to predict whether a patient has diabetes based on observed characteristics like age,
gender, BMI
Predict whether a person will vote for congress or BJP based on age, income, gender, race, state
of residency
8. Classification
Binomial or Binary logistic regression deals with variable in which the observed outcome have
two possible types ex dead or alive
Outcome is coded as 0 or 1
Straightforward interpretation
Multinomial logistic regression deals with situation where there are three or more outcomes
Logistic regression is used for predicting binary outcomes rather than continuous
Takes the natural logarithm of odds of the logit transformation
9. Selects a subset of terms occurring in the training set and uses this subset as features in
text classification
Serves two main purposes:
First, makes training and applying classifier more efficient by decreasing the size of effective
vocabulary
Increases the accuracy by eliminating noise features
A noise feature is one which when added to the document representation, increases the
classification error on new data.
Feature selection replaces the complex classifier (using all features) with a simpler one
(using a subset of features)
10. Mutual Information measures how much information the presence / absence of a term
contributes to making the correct classification decision on c
X2 Feature Selection test’s the independence of two events
Frequency-based feature selection selects terms that are most common in the class.
frequency can be either defined as document frequency – documents in class c that contain the
terms t
Collection frequency – tokens of t that occur in documents in c
document frequency -> Bernoulli model
Collection frequency -> multinomial model
Feature selection for multiple classifiers selects single set of features instead of different
one for each classifier
Editor's Notes
Maximum a Posteriori probability estimate is a mode of the posterior distribution. Posteriori probability of a random event or an uncertain proposition is the conditional probability assigned after the relevant evidence is taken into account. ‘Posteriori’ means taking into account the relevant evidence related to the particular case being examined.