SlideShare a Scribd company logo
1 of 23
Download to read offline
1
University of Technology
Computer Science Department
Data Classification
UData Mining
UPrepared by
SAMMER A.QADER
2018
2
Contents:
1.Probabilistic Classification.
2.Naรฏve Bayes Classifier
3.Logistic Regression.
4.Support Vector Machine
5.Instance โ€“ Based Learning
6.References
3
1.UProbabilistic Classification.
Probabilistic classifiers construct a model that quantifies the relationship between the
feature variables and the target (class) variable as a probability.
There are many ways in which such a modeling can be performed. Two of the most
popular models are as follows:
1. Bayes classifier (generative classifier,)
2. Logistic regression (discriminative classifier,)
U1. Bayes classifier
The Bayes rule is used to model the probability of each value of the target variable for a
given set of feature variables.
It is assumed that the data points within a class are generated from a specific probability
distribution such as
โ€ข Bernoulli distribution
โ€ข Multinomial distribution.
A naive Bayes assumption of class-conditioned feature independence is often (but not
always) used to simplify the modeling.
2. ULogistic regression
The target variable is assumed to be drawn from a Bernoulli distribution whose mean is
defined by a parameterized logit function on the feature variables.
Thus, the probability distribution of the class variable is a parameterized function of the
feature variables. This is in contrast to the Bayes model that assumes a specific
generative model of the feature distribution of each class
4
2. Naรฏve Bayes Classifier: -
โ€ข Naรฏve Bayes is a Supervised Learning Classifier.
โ€ข Naรฏve Bayes classifiers are a family of simple probabilistic classifiers based on
applying Bayes' theorem with strong (naive) independence assumptions between
the features.
๏ถ A Naรฏve Bayesian Model is easy to build, with no complicated iterative
parameter estimation which makes it useful for a very large datasets.
๏ถ Naรฏve Bayes Classifier is surprisingly well and it's widely used because
it often outperforms more sophisticated classification methods.
๏ถ It is based on Frequency Table.
How does it works..
โ€ข Bayes Theorem provides a way of calculating the Posterior Probability, P(c|x) from
P(c), P(x), and P(x|c) .
โ€ข Naรฏve Bayes Classifier assumes that the effect of the value of a predictor (x) on a
given class (c) is independent of the values of other predictors;
โ€ข this assumption is called Class Conditional Independence.
5
โ€ข P (c|x): Posterior Probability of class (target) given predictor (attribute).
โ€ข P (x|c): is the Likelihood, which is the probability of the predictor given the class.
โ€ข P(c): is the Prior Probability of the class (before seeing any data).
โ€ข P(x): is the Prior probability of the predictor.
Example of Naรฏve Bayes Classifier 
Id Outlook Temp Humidity Windy Play Tennis
1 Rainy Hot High False No
2 Rainy Hot High True No
3 Overcast Hot High false Yes
4 Sunny Mid High false Yes
5 Sunny Cool Normal false Yes
6 Sunny Cool Normal True No
7 Overcast Cool Normal True Yes
8 Rainy Mid High false No
9 Rainy Cool Normal false Yes
10 Sunny Mid Normal false Yes
11 Rainy Mid Normal True Yes
12 Overcast Mid High True Yes
13 Overcast Hot Normal false Yes
14 Sunny Mid High True No
6
โ€ข Frequency Tables:
Table 1
Play tennis
Yes No
outlook
Sunny 3/9 2/5
Overcast 4/9 0/5
Rainy 2/9 3/5
Table 3
Play tennis
Yes No
Humidity
High 3/9 4/5
Normal 6/9 1/5
โ€ข Class Probability: Play Tennis
P (Yes) 9/14
P (No) 5/14
โ€ข
โ€ข Likelihood Tables
Table 1
Play tennis
Yes No
outlook
Sunny 3/9 2/5 5/14
Overcast 4/9 0/5 4/14
Rainy 2/9 3/5 5/14
Table 3
Play tennis
Yes No
Humidity
High 3/9 4/5 7/14
Normal 6/9 1/5 7/14
Say that we want to calculate the Posterior Probability to the class (Yes) given
(sunny) according to the P (C|X) previous equation:
P (C|X) = P (X|C)*P(C)/P(X)
P (Yes|Sunny) = P (Sunny|yes)*P(Yes)/P(sunny)
= (3/9) * (9/14) / (5/14)
= 0.33 * 0.64 / 0.36
= 0.60
Table 2
Play tennis
Yes No
Temp
Hot 2/9 2/5
Mid 4/9 2/5
Cool 3/9 1/5
Table 4
Play tennis
Yes No
Windy
False 6/9 2/5
True 3/9 3/5
Table 2
Play tennis
Yes No
Temp
Hot 2/9 2/5 4/14
Mid 4/9 2/5 6/14
Cool 3/9 1/5 4/14
Table 4
Play tennis
Yes No
Windy
False 6/9 2/5 8/14
True 3/9 3/5 6/14
7
Now let's assume the following data of a day:
id Outlook Temp Humidity Windy Play Tennis
Rainy Mid Normal True ?
Likelihood of Yes =
P(Outlook=Rainy|Yes)
*P(Temp=Mid|Yes)P(Humidity=Normal|Yes)*P(Windy=True|Yes)*P(Yes)
= 2/9 *4/9 *6/9 *3/9 * 9/14
= 0.014109347
Likelihood of No =P(Outlook=Rainy|No)
*P(Temp=Mid|No)*P(Humidity=Normal|No)*P(Windy=True|No)*P(Yes)
= 3/5 *2/5 *1/5 *3/5 * 5/14
= 0.010285714
Normalizing (dividing by the evidence)
P (Yes) = 0.014109347/ (0.014109347+0.010285714) = 0.578368999
P (No) = 0.010285714/ (0.014109347+0.010285714) =0.421631001
P (Yes) > P (No)
id Outlook Temp Humidity Windy Play Tennis
Rainy Mid Normal True yes
Since the evidence is constant and scales both posteriors equally. It therefore does not
affect classification and can be ignored.
8
3. Logistic Regression
Logistic regression is a regression model where the dependent variable (DV) is
categorical. The output can take only two values, "0" and "1" (binary classification),
which represent outcomes such as pass/fail, win/lose, alive/dead or healthy/sick.
Idea 1: Let ๐‘๐‘(๐‘ฅ๐‘ฅ)be a linear function
โ€ข W are estimating a probability, which must be between 0 and 1
โ€ข Linear functions are unbounded, so this approach doesnโ€™t work
Better idea:
โ€ข Set the odds ratio to a linear function:
log๐‘œ๐‘œ๐‘‘๐‘‘๐‘‘๐‘‘๐‘ ๐‘ =๐‘™๐‘™๐‘œ๐‘œ ๐‘”๐‘”๐‘–๐‘–๐‘ก๐‘ก๐‘๐‘=ln๐‘๐‘1โˆ’๐‘๐‘=๐›ฝ๐›ฝ0+๐›ฝ๐›ฝ1 ๐‘ฅ๐‘ฅ
Solving for p:
๐‘๐‘๐‘ฅ๐‘ฅ=๐‘’๐‘’๐›ฝ๐›ฝ0+๐›ฝ๐›ฝ1๐‘ฅ๐‘ฅ1+ ๐‘’๐‘’ ๐›ฝ๐›ฝ0+๐›ฝ๐›ฝ1๐‘ฅ๐‘ฅ= ๐Ÿ๐Ÿ/๐Ÿ๐Ÿ+ ๐’†๐’†โˆ’ (๐œท๐œท๐ŸŽ๐ŸŽ+๐œท๐œท๐Ÿ๐Ÿ ๐’™๐’™)
โ€ข This is called the logistic (logit) function and it assumes values [0,1]
โ€ข ๐›ฝ๐›ฝ0, ๐›ฝ๐›ฝ1, are estimated as the โ€˜log-oddsโ€™ of a unit change in the input feature
it is associated with.
๏ถ Logit Function:
Logistic regression is an estimate of a logit function. Is used to estimate probabilities of
class membership instead of constructing a squared error objective here is how the logit
function looks like:
๏ถ The core of logistic regression is the sigmoid function:
9
The sigmoid function wraps linear function y = mx+b or y= ๐›ฝ๐›ฝ0+๐›ฝ๐›ฝ1๐‘ฅ๐‘ฅ or to
force the output to be between 0 and 1. The output can, therefore, be interpreted as a
probability.
Logistic function linear function
To minimize misclassification rates, we predict:
โ€ข ๐‘Œ๐‘Œ=1when ๐‘๐‘(๐‘ฅ๐‘ฅ)โ‰ฅ0.5 and ๐‘Œ๐‘Œ=0 when ๐‘๐‘(๐‘ฅ๐‘ฅ)<0.5
โ€ข So ๐‘Œ๐‘Œ=1when ๐›ฝ๐›ฝ0+๐›ฝ๐›ฝ1๐‘ฅ๐‘ฅ is non-negative and 0 otherwise
โ€ขLogistic regression gives us a linear classifier where the decision boundary
separating the two classes is the solution of ๐›ฝ๐›ฝ0+๐›ฝ๐›ฝ1๐‘ฅ๐‘ฅ=0
10.5.2.1 Training a Logistic Regression Classifier
The maximum likelihood approach is used to estimate the best fitting parameters of the
Logistic regression model
In other meaning the parameters ๐›ฝ๐›ฝ0, ๐›ฝ๐›ฝ1, are estimated using a technique called
Maximum likelihood estimation
Logistic regression is similar to classical least-squares linear regression
The difference that the logit function is used to estimate probabilities of class
membership instead of constructing a squared error objective. Consequently, instead
of the least-squares optimization in linear regression, a maximum likelihood
optimization model is used for logistic regression
10
Example: Suppose that medical researchers are interested in exploring the relationship
between patient age (x) and the presence (1) or absence (0) of a particular disease (y).
The data collected from 20 patients are shown
Letโ€™s examine the results of the logistic regression of disease on age, shown in Table 4.2. The coefficients, that
is, the maximum likelihood estimates of the unknown Parameters ฮฒ0 and ฮฒ1, are given as :
๐›ฝ๐›ฝ0= โˆ’4.372
๐›ฝ๐›ฝ1 = 0.06696.
N Age Y
1 25 0
2 29 0
3 30 0
4 31 0
5 32 0
6 41 0
7 41 0
8 42 0
9 44 1
10 49 1
11 50 0
12 59 1
13 60 0
14 62 0
15 68 1
16 72 0
17 79 1
18 80 0
19 81 1
20 84 1
sum 1059 7
11
These equations may then be used to estimate the probability that the disease
is present in a particular patient given the patientโ€™s age. For example, for a 50-
year-old patient, we have
Thus, the estimated probability that a 50-year-old patient has the disease is
26%, and the estimated probability that the disease is not present is 100% โˆ’
26% = 74%. On the other hand, for a 72-year-old patient, we have
The estimated probability that a 72-year-old patient has the disease is 61%, and
the estimated probability that the disease is not present is 39%.
12
4. Linear Support Vector Machine in Mathematical
Steps for Solution the LSVM
1. Find the maximum margin linear.
2. Determine the support vector.
3. Determine ( x ,y )for each support vector.
4. Here we will use vectors augmented with a 1 as a bias input, and for clarity we
will differentiate these with an over-tilde.
5. Determine the class the each support vector belong it. If the group > 1 the class
equals the 1, otherwise equal to -1.
6. Find the ฯi for each support vector by apply equation
ฯi Si Sj+ ฯi Si+1 Sj+...........= (- or + 1 dependence of class (y))
For each support vector.
7. The hyper plane that discriminates the positive class from the negative class is
given by:- ๐‘Š๐‘Š๏ฟฝ = โˆ‘ ๐›ผ๐›ผ๐‘–๐‘– ๐‘†๐‘†๐‘–๐‘–
๏ฟฝ
8. Our vectors are augmented with a bias.
9. Hence we can equate the entry in ๐‘ค๐‘ค as the hyper plane with an offset b.
10- Therefore the separating hyper plane equation ๐‘ฆ๐‘ฆ=๐‘ค๐‘ค๐‘ฅ๐‘ฅ+๐‘๐‘
Fig.4 (LSVM)
13
Example for LSVM: - Find the support vector machine?
Solution:-
๏ถFind the maximum margin linear.
๏ถDetermine the support vector.
Here we select 3 Support Vectors to start with.They are S1, S2 and S3.
๏ถDetermine ( x1, x2 )for each support vector.
๐‘†๐‘†1 = ๏ฟฝ
2
1
๏ฟฝ , ๐‘†๐‘†2 = ๏ฟฝ
2
โˆ’1
๏ฟฝ , ๐‘†๐‘†3 = ๏ฟฝ
4
0
๏ฟฝ
๏ถHere we will use vectors augmented with a 1 as a bias input, and for clarity we will
differentiate these with an over-tilde.
๐‘†๐‘†1
๏ฟฝ = ๏ฟฝ
2
1
1
๏ฟฝ , ๐‘†๐‘†2
๏ฟฝ = ๏ฟฝ
2
โˆ’1
1
๏ฟฝ , ๐‘†๐‘†3
๏ฟฝ = ๏ฟฝ
4
0
1
๏ฟฝ
14
๏ถNow we need to find 3 parameters ๐›ผ๐›ผ1,๐›ผ๐›ผ2, and ๐›ผ๐›ผ3 based on the following 3 linear
equations:
Let's substitute the values for ๐‘†๐‘†1
๏ฟฝ , ๐‘†๐‘†2
๏ฟฝ , ๐‘Ž๐‘Ž๐‘Ž๐‘Ž๐‘Ž๐‘Ž ๐‘†๐‘†3
๏ฟฝ in the above equation.
๐‘†๐‘†1
๏ฟฝ = ๏ฟฝ
2
1
1
๏ฟฝ , ๐‘†๐‘†2
๏ฟฝ = ๏ฟฝ
2
โˆ’1
1
๏ฟฝ , ๐‘†๐‘†3
๏ฟฝ = ๏ฟฝ
4
0
1
๏ฟฝ
After simplification we get:
Simplifying the above 3 simultaneous equations we get:
๐›ผ๐›ผ1 = ๐›ผ๐›ผ2 = โˆ’3.5 ๐‘Ž๐‘Ž๐‘Ž๐‘Ž๐‘Ž๐‘Ž ๐›ผ๐›ผ3 = 3.5
The hyperplane that discriminates the positive class from the negative class is given by:
๐‘Š๐‘Š๏ฟฝ = ๏ฟฝ ๐›ผ๐›ผ๐‘–๐‘– ๐‘†๐‘†๐‘–๐‘–
๏ฟฝ
15
Substituting the value we get:
Therefore the separating hyper plane equation ๐‘ฆ๐‘ฆ=๐‘ค๐‘ค๐‘ฅ๐‘ฅ+๐‘๐‘ with ๐‘ค๐‘ค๏ฟฝ=
1
0
and offset ๐‘๐‘=โˆ’3.
16
Example2:
Factory โ€œABCโ€ produces very precise high quality chip rings that their qualities are
measured in term of curvature and diameter. Result of quality control by experts is
given in the Table below
curvature diameter Quality control result
2.947814 6.626878 Passed
2.530388 7.785050 Passed
3.566991 5.651046 Passed
3.156983 5.467077 Passed
2.582346 4.457777 Not-passed
2.155826 6.222343 Not-passed
3.273418 3.520687 Not-passed
2.8100 5.456782 ?
The new chip rings have curvature 2.8100 and diameter 5.456782. Can you solve this
problem by employing SVM?
SOLUTION:
In above example, we have training data consists of two numerical features, curvature
and diameter. For each data, we also have predetermined groups: Passed or Not-Passed
the manual quality control. We are going to create a model to classify the training data.
0
2
4
6
8
10
0 1 2 3 4
diameter
curvature
y= -1
y=+1
17
๐‘บ๐‘บ๐Ÿ๐Ÿ = ๏ฟฝ
๐Ÿ‘๐Ÿ‘. ๐Ÿ๐Ÿ
๐Ÿ‘๐Ÿ‘. ๐Ÿ“๐Ÿ“
๏ฟฝ ๐‘บ๐‘บ๐Ÿ๐Ÿ = ๏ฟฝ
๐Ÿ‘๐Ÿ‘. ๐Ÿ๐Ÿ
๐Ÿ“๐Ÿ“. ๐Ÿ’๐Ÿ’
๏ฟฝ
๐‘บ๐‘บ๏ฟฝ ๐Ÿ๐Ÿ = ๏ฟฝ
๐Ÿ‘๐Ÿ‘. ๐Ÿ๐Ÿ
๐Ÿ‘๐Ÿ‘. ๐Ÿ“๐Ÿ“
๐Ÿ๐Ÿ
๏ฟฝ ๐‘บ๐‘บ๏ฟฝ ๐Ÿ๐Ÿ = ๏ฟฝ
๐Ÿ‘๐Ÿ‘. ๐Ÿ๐Ÿ
๐Ÿ“๐Ÿ“. ๐Ÿ’๐Ÿ’
๐Ÿ๐Ÿ
๏ฟฝ
๐œถ๐œถ๐Ÿ๐Ÿ ๐‘บ๐‘บ๐Ÿ๐Ÿ . ๐‘บ๐‘บ๐Ÿ๐Ÿ + ๐œถ๐œถ๐Ÿ๐Ÿ ๐‘บ๐‘บ๐Ÿ๐Ÿ. ๐‘บ๐‘บ๐Ÿ๐Ÿ = โˆ’๐Ÿ๐Ÿ
๐œถ๐œถ๐Ÿ๐Ÿ ๐‘บ๐‘บ๐Ÿ๐Ÿ . ๐‘บ๐‘บ๐Ÿ๐Ÿ + ๐œถ๐œถ๐Ÿ๐Ÿ ๐‘บ๐‘บ๐Ÿ๐Ÿ. ๐‘บ๐‘บ๐Ÿ๐Ÿ = ๐Ÿ๐Ÿ
๐œถ๐œถ๐Ÿ๐Ÿ ๏ฟฝ
๐Ÿ‘๐Ÿ‘. ๐Ÿ๐Ÿ
๐Ÿ‘๐Ÿ‘. ๐Ÿ“๐Ÿ“
๐Ÿ๐Ÿ
๏ฟฝ . ๏ฟฝ
๐Ÿ‘๐Ÿ‘. ๐Ÿ๐Ÿ
๐Ÿ‘๐Ÿ‘. ๐Ÿ“๐Ÿ“
๐Ÿ๐Ÿ
๏ฟฝ + ๐œถ๐œถ๐Ÿ๐Ÿ ๏ฟฝ
๐Ÿ‘๐Ÿ‘. ๐Ÿ๐Ÿ
๐Ÿ“๐Ÿ“. ๐Ÿ’๐Ÿ’
๐Ÿ๐Ÿ
๏ฟฝ . ๏ฟฝ
๐Ÿ‘๐Ÿ‘. ๐Ÿ๐Ÿ
๐Ÿ‘๐Ÿ‘. ๐Ÿ“๐Ÿ“
๐Ÿ๐Ÿ
๏ฟฝ = โˆ’๐Ÿ๐Ÿ
๐œถ๐œถ๐Ÿ๐Ÿ ๏ฟฝ
๐Ÿ‘๐Ÿ‘. ๐Ÿ๐Ÿ
๐Ÿ‘๐Ÿ‘. ๐Ÿ“๐Ÿ“
๐Ÿ๐Ÿ
๏ฟฝ . ๏ฟฝ
๐Ÿ‘๐Ÿ‘. ๐Ÿ๐Ÿ
๐Ÿ“๐Ÿ“. ๐Ÿ’๐Ÿ’
๐Ÿ๐Ÿ
๏ฟฝ + ๐œถ๐œถ๐Ÿ๐Ÿ ๏ฟฝ
๐Ÿ‘๐Ÿ‘. ๐Ÿ๐Ÿ
๐Ÿ“๐Ÿ“. ๐Ÿ’๐Ÿ’
๐Ÿ๐Ÿ
๏ฟฝ . ๏ฟฝ
๐Ÿ‘๐Ÿ‘. ๐Ÿ๐Ÿ
๐Ÿ“๐Ÿ“. ๐Ÿ’๐Ÿ’
๐Ÿ๐Ÿ
๏ฟฝ = ๐Ÿ๐Ÿ
๐Ÿ๐Ÿ๐Ÿ๐Ÿ. ๐Ÿ’๐Ÿ’ ๐œถ๐œถ๐Ÿ๐Ÿ + ๐Ÿ๐Ÿ๐Ÿ๐Ÿ. ๐Ÿ–๐Ÿ– ๐œถ๐œถ๐Ÿ๐Ÿ = โˆ’๐Ÿ๐Ÿ
๐Ÿ๐Ÿ๐Ÿ๐Ÿ. ๐Ÿ–๐Ÿ– ๐œถ๐œถ๐Ÿ๐Ÿ + ๐Ÿ‘๐Ÿ‘๐Ÿ‘๐Ÿ‘. ๐Ÿ•๐Ÿ• ๐œถ๐œถ๐Ÿ๐Ÿ = ๐Ÿ๐Ÿ
๐œถ๐œถ๐Ÿ๐Ÿ = โˆ’๐Ÿ๐Ÿ. ๐Ÿ”๐Ÿ” ๐œถ๐œถ๐Ÿ๐Ÿ = ๐Ÿ๐Ÿ. ๐Ÿ๐Ÿ๐Ÿ๐Ÿ
๐’˜๐’˜๏ฟฝ = ๏ฟฝ ๐œถ๐œถ๐’Š๐’Š ๐‘บ๐‘บ๐’Š๐’Š
๏ฟฝ
๐’˜๐’˜๏ฟฝ = (โˆ’๐Ÿ๐Ÿ. ๐Ÿ”๐Ÿ”). ๏ฟฝ
๐Ÿ๐Ÿ. ๐Ÿ‘๐Ÿ‘
๐Ÿ‘๐Ÿ‘. ๐Ÿ“๐Ÿ“
๐Ÿ๐Ÿ
๏ฟฝ + ( ๐Ÿ๐Ÿ. ๐Ÿ๐Ÿ๐Ÿ๐Ÿ). ๏ฟฝ
๐Ÿ‘๐Ÿ‘. ๐Ÿ๐Ÿ
๐Ÿ“๐Ÿ“. ๐Ÿ’๐Ÿ’
๐Ÿ๐Ÿ
๏ฟฝ = ๏ฟฝ
๐ŸŽ๐ŸŽ. ๐Ÿ‘๐Ÿ‘
โˆ’๐Ÿ๐Ÿ. ๐Ÿ‘๐Ÿ‘
โˆ’๐ŸŽ๐ŸŽ. ๐Ÿ’๐Ÿ’
๏ฟฝ
To classify the new point (x1, x2) =(2.8,5.4)
๐’˜๐’˜. ๐’™๐’™ = ๏ฟฝ
๐ŸŽ๐ŸŽ. ๐Ÿ‘๐Ÿ‘
โˆ’๐Ÿ๐Ÿ. ๐Ÿ‘๐Ÿ‘
๏ฟฝ . ๏ฟฝ
๐Ÿ๐Ÿ. ๐Ÿ–๐Ÿ–
๐Ÿ“๐Ÿ“. ๐Ÿ’๐Ÿ’
๏ฟฝ = โˆ’๐Ÿ”๐Ÿ”. ๐Ÿ๐Ÿ < 1
The point belongs to the class -1 which means not-passed
18
5. Instance-Based Learning
Most of the classifiers discussed in the previous sections are eager learners in which
the classification model is constructed up front and then used to classify a specific test
instance
โ€ข In instance-based learning, the training is delayed until the last step of
classification. Such classifiers are also referred to as lazy learners
โ€ข The simplest principle to describe instance based learning is as follows:
Similar instances have similar class labels.
๏ƒ˜ Different Learning Methods
โ€ข Eager Learning
โ€“ Learning = acquiring an explicit structure of a classifier on the whole
training set;
โ€“ Classification = an instance gets a classification using the explicit structure
of the classifier.
โ€ข Instance-Based Learning (Lazy Learning)
โ€“ Learning = storing all training instances
โ€“ Classification = an instance gets a classification equal to the classification
of the nearest instances to the instance
Similar instances have similar class labels.
19
5.1 Design Variations of Nearest Neighbor Classifiers
Unsupervised Mahalanobis Metric
The value of A is chosen to be the inverse of the d ร— d covariance matrix ฮฃ of the data
set. The (i, j)th entry of the matrix ฮฃ is the covariance between the dimensions i and j.
Therefore, the Mahalanobis distance is defined as follows:
The Mahalanobis metric adjusts well to the different scaling of the dimensions and the
Redundancies across different features. Even when the data is uncorrelated, the
Mahalanobis metric is useful because it auto-scales for the naturally different ranges of
attributes describing different physical quantities,
How does Mahalanobis Metric works..
1. We need to find the center matrix for each group
2. Then, we calculate the covariance matrix, which is calculated as follows:
3. The next step after creating the covariance matrices for group 1 and group 2 is
to calculate the pooled covariance matrix
4. Finally to calculate the Mahalanobis distance by taking the square root of
multiplication of the difference between the means of G1 and G2 by the inverse
of pooled covariance matrix.
Example:
Group 1 Group 2
x1 y1 x2 y2
2 2 6 5
2 5 7 4
6 5 8 7
7 3 5 6
4 7 5 4
6 4
5 3
4 6
2 5
1 3
Mean
X1 Y1 X2 Y2
3.9 4.3 6.2 5.2
Total data of group 1 = M 10
Total data of group 2 = N 5
Total data = q 15
20
1. We need to find the center matrix for each group, which can be calculated using
the following formula:
Center matrix X1= ๐‘ฟ๐‘ฟ๐‘ฟ๐‘ฟ โˆ’ ๐‘ฟ๐‘ฟ๏ฟฝ
Center matrix Y1 = ๐’€๐’€๐’€๐’€ โˆ’ ๐’€๐’€๏ฟฝ
The centered groups are:
Group 1 Group 2
x1 y1 x2 y2
-1.90 -2.30 -0.20 -0.20
-1.90 0.70 0.80 -1.20
2.10 0.70 1.80 1.80
3.10 -1.30 -1.20 0.80
0.10 2.70 -1.20 -1.20
2.10 -0.30
1.10 -1.30
0.10 1.70
-1.90 0.70
-2.90 -1.30
2. Then, we calculate the covariance for group 1 and 2 matrix, which is calculated
as follows:
1/n X.XT
where n is the number of data points
1
๐‘๐‘
X
โŽฃ
โŽข
โŽข
โŽข
โŽข
โŽข
โŽข
โŽข
โŽข
โŽก
โˆ’๐Ÿ๐Ÿ. ๐Ÿ—๐Ÿ— โˆ’๐Ÿ๐Ÿ. ๐Ÿ‘๐Ÿ‘
โˆ’๐Ÿ๐Ÿ. ๐Ÿ—๐Ÿ— ๐ŸŽ๐ŸŽ. ๐Ÿ•๐Ÿ•
๐Ÿ๐Ÿ. ๐Ÿ๐Ÿ ๐ŸŽ๐ŸŽ. ๐Ÿ•๐Ÿ•
๐Ÿ‘๐Ÿ‘. ๐Ÿ๐Ÿ โˆ’๐Ÿ๐Ÿ. ๐Ÿ‘๐Ÿ‘
๐ŸŽ๐ŸŽ. ๐Ÿ๐Ÿ ๐Ÿ๐Ÿ. ๐Ÿ•๐Ÿ•
๐Ÿ๐Ÿ. ๐Ÿ๐Ÿ โˆ’๐ŸŽ๐ŸŽ. ๐Ÿ‘๐Ÿ‘
๐Ÿ๐Ÿ. ๐Ÿ๐Ÿ โˆ’๐Ÿ๐Ÿ. ๐Ÿ‘๐Ÿ‘
๐ŸŽ๐ŸŽ. ๐Ÿ๐Ÿ ๐Ÿ๐Ÿ. ๐Ÿ•๐Ÿ•
โˆ’๐Ÿ๐Ÿ. ๐Ÿ—๐Ÿ— ๐ŸŽ๐ŸŽ. ๐Ÿ•๐Ÿ•
โˆ’๐Ÿ๐Ÿ. ๐Ÿ—๐Ÿ— โˆ’๐Ÿ๐Ÿ. ๐Ÿ‘๐Ÿ‘โŽฆ
โŽฅ
โŽฅ
โŽฅ
โŽฅ
โŽฅ
โŽฅ
โŽฅ
โŽฅ
โŽค
X ๏ฟฝ
โˆ’๐Ÿ๐Ÿ. ๐Ÿ—๐Ÿ— โˆ’๐Ÿ๐Ÿ. ๐Ÿ—๐Ÿ— ๐Ÿ๐Ÿ. ๐Ÿ๐Ÿ ๐Ÿ‘๐Ÿ‘. ๐Ÿ๐Ÿ ๐ŸŽ๐ŸŽ. ๐Ÿ๐Ÿ ๐Ÿ๐Ÿ. ๐Ÿ๐Ÿ ๐Ÿ๐Ÿ. ๐Ÿ๐Ÿ ๐ŸŽ๐ŸŽ. ๐Ÿ๐Ÿ โˆ’๐Ÿ๐Ÿ. ๐Ÿ—๐Ÿ— โˆ’๐Ÿ๐Ÿ. ๐Ÿ—๐Ÿ—
โˆ’๐Ÿ๐Ÿ. ๐Ÿ‘๐Ÿ‘ ๐ŸŽ๐ŸŽ. ๐Ÿ•๐Ÿ• ๐ŸŽ๐ŸŽ. ๐Ÿ•๐Ÿ• โˆ’๐Ÿ๐Ÿ. ๐Ÿ‘๐Ÿ‘ ๐Ÿ๐Ÿ. ๐Ÿ•๐Ÿ• โˆ’๐ŸŽ๐ŸŽ. ๐Ÿ‘๐Ÿ‘ โˆ’๐Ÿ๐Ÿ. ๐Ÿ‘๐Ÿ‘ ๐Ÿ๐Ÿ. ๐Ÿ•๐Ÿ• ๐ŸŽ๐ŸŽ. ๐Ÿ•๐Ÿ• โˆ’๐Ÿ๐Ÿ. ๐Ÿ‘๐Ÿ‘
๏ฟฝT =
21
The result will be:
Covariance of Group 1
x1 y1
x1 3.89 0.13
y1 0.13 2.21
Covariance of Group 2
x2 y2
x2 1.36 0.56
y2 0.56 1.36
3. The next step after creating the covariance matrices for group 1 and group 2 is to
calculate the pooled covariance matrix
Pooled Covariance Matrix
x y
x 3.05 0.27
y 0.27 1.93
4. Finally to calculate the Mahalanobis distance by taking the square root of
multiplication of the difference between the means of G1 and G2 by the inverse of
pooled covariance matrix.
22
Inverse Pooled Covariance matrix
INVERS ๏ฟฝ
3.05 0.27
0.27 1.93
๏ฟฝ =
1
(3.05โˆ—1.93)โˆ’(0.27โˆ—0.27)
x ๏ฟฝ
1.93 โˆ’0.27
โˆ’0.27 3.05
๏ฟฝ = ๏ฟฝ
0.332 โˆ’0.047
โˆ’0.047 0.526
๏ฟฝ
x Y
x 0.332 -0.047
y -0.047 0.526
Mean difference (G1- G2)
-2.3 ๐‘ฟ๐‘ฟ๐‘ฟ๐‘ฟ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ โˆ’ ๐‘ฟ๐‘ฟ๐‘ฟ๐‘ฟ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ = 3.9 - 6.2 = -2.3
-0.9 ๐’š๐’š๐’š๐’š๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ โˆ’ ๐’š๐’š๐’š๐’š๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ = 4.3 - 5.2 = -0.9
Mahalanobis distance
= 1.41
๏ฟฝ
โˆ’2.3
โˆ’0.9
๏ฟฝ x (โˆ’2.3 โˆ’0.9) x ๏ฟฝ
0.332 โˆ’0.047
โˆ’0.047 0.526
๏ฟฝ = 1.41
23
References:
Data mining // The Textbook by:
Charu C. Aggarwal
IBM T.J. Watson Research Center
Yorktown Heights
New York USA
โ€ข https://www.autonlab.org/tutorials/mbl.html
โ€ข https://en.wikipedia.org/wiki/Logistic_regression
โ€ข http://people.revoledu.com/kardi/tutorial/Similarity/MahalanobisDistance.h
tml
โ€ข https://www.mathsisfun.com/algebra/matrix-multiplying.html

More Related Content

What's hot

Chapter05
Chapter05Chapter05
Chapter05
rwmiller
ย 
Random variables
Random variablesRandom variables
Random variables
mrraymondstats
ย 
Chapter14
Chapter14Chapter14
Chapter14
rwmiller
ย 
Chapter08
Chapter08Chapter08
Chapter08
rwmiller
ย 

What's hot (20)

Chapter05
Chapter05Chapter05
Chapter05
ย 
Discrete and continuous probability distributions ppt @ bec doms
Discrete and continuous probability distributions ppt @ bec domsDiscrete and continuous probability distributions ppt @ bec doms
Discrete and continuous probability distributions ppt @ bec doms
ย 
The Standard Normal Distribution
The Standard Normal DistributionThe Standard Normal Distribution
The Standard Normal Distribution
ย 
Chapter7
Chapter7Chapter7
Chapter7
ย 
Chapter 4 part2- Random Variables
Chapter 4 part2- Random VariablesChapter 4 part2- Random Variables
Chapter 4 part2- Random Variables
ย 
Random variables
Random variablesRandom variables
Random variables
ย 
Chapter11
Chapter11Chapter11
Chapter11
ย 
Practice Test 2 Solutions
Practice Test 2  SolutionsPractice Test 2  Solutions
Practice Test 2 Solutions
ย 
Chapter14
Chapter14Chapter14
Chapter14
ย 
Assessing Normality
Assessing NormalityAssessing Normality
Assessing Normality
ย 
Discrete and Continuous Random Variables
Discrete and Continuous Random VariablesDiscrete and Continuous Random Variables
Discrete and Continuous Random Variables
ย 
Chapter14
Chapter14Chapter14
Chapter14
ย 
Chapter3
Chapter3Chapter3
Chapter3
ย 
Mean (Girja Pd. Patel)
Mean (Girja Pd. Patel)Mean (Girja Pd. Patel)
Mean (Girja Pd. Patel)
ย 
Chapter15
Chapter15Chapter15
Chapter15
ย 
Statistics and probability pp
Statistics and  probability ppStatistics and  probability pp
Statistics and probability pp
ย 
Practice test ch 10 correlation reg ch 11 gof ch12 anova
Practice test ch 10 correlation reg ch 11 gof ch12 anovaPractice test ch 10 correlation reg ch 11 gof ch12 anova
Practice test ch 10 correlation reg ch 11 gof ch12 anova
ย 
Chapter13
Chapter13Chapter13
Chapter13
ย 
Chapter08
Chapter08Chapter08
Chapter08
ย 
Chapter 12
Chapter 12Chapter 12
Chapter 12
ย 

Similar to Data classification sammer

NB classifier to use your next exam aslo
NB classifier to use your next exam asloNB classifier to use your next exam aslo
NB classifier to use your next exam aslo
kuntalpatra420
ย 
Workshop 4
Workshop 4Workshop 4
Workshop 4
eeetq
ย 

Similar to Data classification sammer (20)

maXbox starter67 machine learning V
maXbox starter67 machine learning VmaXbox starter67 machine learning V
maXbox starter67 machine learning V
ย 
Hmisiri nonparametrics book
Hmisiri nonparametrics bookHmisiri nonparametrics book
Hmisiri nonparametrics book
ย 
Naive Bayes Presentation
Naive Bayes PresentationNaive Bayes Presentation
Naive Bayes Presentation
ย 
Anova.ppt
Anova.pptAnova.ppt
Anova.ppt
ย 
Calibrating Probability with Undersampling for Unbalanced Classification
Calibrating Probability with Undersampling for Unbalanced ClassificationCalibrating Probability with Undersampling for Unbalanced Classification
Calibrating Probability with Undersampling for Unbalanced Classification
ย 
NB classifier to use your next exam aslo
NB classifier to use your next exam asloNB classifier to use your next exam aslo
NB classifier to use your next exam aslo
ย 
NB classifier_Detailed pdf you can use it
NB classifier_Detailed pdf you can use itNB classifier_Detailed pdf you can use it
NB classifier_Detailed pdf you can use it
ย 
ISM_Session_5 _ 23rd and 24th December.pptx
ISM_Session_5 _ 23rd and 24th December.pptxISM_Session_5 _ 23rd and 24th December.pptx
ISM_Session_5 _ 23rd and 24th December.pptx
ย 
Lesson 27 using statistical techniques in analyzing data
Lesson 27 using statistical techniques in analyzing dataLesson 27 using statistical techniques in analyzing data
Lesson 27 using statistical techniques in analyzing data
ย 
Bisection method
Bisection methodBisection method
Bisection method
ย 
Big Data Analysis
Big Data AnalysisBig Data Analysis
Big Data Analysis
ย 
MODULE_05-Matrix Decomposition.pptx
MODULE_05-Matrix Decomposition.pptxMODULE_05-Matrix Decomposition.pptx
MODULE_05-Matrix Decomposition.pptx
ย 
Workshop 4
Workshop 4Workshop 4
Workshop 4
ย 
Linear regression
Linear regressionLinear regression
Linear regression
ย 
Model Selection and Validation
Model Selection and ValidationModel Selection and Validation
Model Selection and Validation
ย 
1624.pptx
1624.pptx1624.pptx
1624.pptx
ย 
JISA_Paper
JISA_PaperJISA_Paper
JISA_Paper
ย 
One-Way ANOVA: Conceptual Foundations
One-Way ANOVA: Conceptual FoundationsOne-Way ANOVA: Conceptual Foundations
One-Way ANOVA: Conceptual Foundations
ย 
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...
ย 
Gene expression profiling ii
Gene expression profiling  iiGene expression profiling  ii
Gene expression profiling ii
ย 

More from Sammer Qader (10)

Tam &amp; toe
Tam &amp; toeTam &amp; toe
Tam &amp; toe
ย 
Project integration management ch 4
Project integration management ch 4Project integration management ch 4
Project integration management ch 4
ย 
Introduction to the management of information security
Introduction to the management of information security  Introduction to the management of information security
Introduction to the management of information security
ย 
Text compression
Text compressionText compression
Text compression
ย 
Regression analysis algorithm
Regression analysis algorithm Regression analysis algorithm
Regression analysis algorithm
ย 
Transport laye
Transport laye Transport laye
Transport laye
ย 
Information & Data Architecture
Information & Data ArchitectureInformation & Data Architecture
Information & Data Architecture
ย 
Chapter 4 Ethical and Social Issues in Information Systems
Chapter 4 Ethical and Social Issues in Information SystemsChapter 4 Ethical and Social Issues in Information Systems
Chapter 4 Ethical and Social Issues in Information Systems
ย 
Project integration management ch 4
Project integration management ch 4Project integration management ch 4
Project integration management ch 4
ย 
Cloud computing
Cloud computingCloud computing
Cloud computing
ย 

Recently uploaded

The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
ย 
80 ฤแป€ THI THแปฌ TUYแป‚N SINH TIแบพNG ANH Vร€O 10 Sแปž GD โ€“ ฤT THร€NH PHแป Hแป’ CHร MINH Nฤ‚...
80 ฤแป€ THI THแปฌ TUYแป‚N SINH TIแบพNG ANH Vร€O 10 Sแปž GD โ€“ ฤT THร€NH PHแป Hแป’ CHร MINH Nฤ‚...80 ฤแป€ THI THแปฌ TUYแป‚N SINH TIแบพNG ANH Vร€O 10 Sแปž GD โ€“ ฤT THร€NH PHแป Hแป’ CHร MINH Nฤ‚...
80 ฤแป€ THI THแปฌ TUYแป‚N SINH TIแบพNG ANH Vร€O 10 Sแปž GD โ€“ ฤT THร€NH PHแป Hแป’ CHร MINH Nฤ‚...
Nguyen Thanh Tu Collection
ย 
Tแป”NG ร”N TแบฌP THI Vร€O LแปšP 10 Mร”N TIแบพNG ANH Nฤ‚M HแปŒC 2023 - 2024 Cร“ ฤรP รN (NGแปฎ ร‚...
Tแป”NG ร”N TแบฌP THI Vร€O LแปšP 10 Mร”N TIแบพNG ANH Nฤ‚M HแปŒC 2023 - 2024 Cร“ ฤรP รN (NGแปฎ ร‚...Tแป”NG ร”N TแบฌP THI Vร€O LแปšP 10 Mร”N TIแบพNG ANH Nฤ‚M HแปŒC 2023 - 2024 Cร“ ฤรP รN (NGแปฎ ร‚...
Tแป”NG ร”N TแบฌP THI Vร€O LแปšP 10 Mร”N TIแบพNG ANH Nฤ‚M HแปŒC 2023 - 2024 Cร“ ฤรP รN (NGแปฎ ร‚...
Nguyen Thanh Tu Collection
ย 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
KarakKing
ย 

Recently uploaded (20)

Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
ย 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
ย 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
ย 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
ย 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
ย 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
ย 
80 ฤแป€ THI THแปฌ TUYแป‚N SINH TIแบพNG ANH Vร€O 10 Sแปž GD โ€“ ฤT THร€NH PHแป Hแป’ CHร MINH Nฤ‚...
80 ฤแป€ THI THแปฌ TUYแป‚N SINH TIแบพNG ANH Vร€O 10 Sแปž GD โ€“ ฤT THร€NH PHแป Hแป’ CHร MINH Nฤ‚...80 ฤแป€ THI THแปฌ TUYแป‚N SINH TIแบพNG ANH Vร€O 10 Sแปž GD โ€“ ฤT THร€NH PHแป Hแป’ CHร MINH Nฤ‚...
80 ฤแป€ THI THแปฌ TUYแป‚N SINH TIแบพNG ANH Vร€O 10 Sแปž GD โ€“ ฤT THร€NH PHแป Hแป’ CHร MINH Nฤ‚...
ย 
Tแป”NG ร”N TแบฌP THI Vร€O LแปšP 10 Mร”N TIแบพNG ANH Nฤ‚M HแปŒC 2023 - 2024 Cร“ ฤรP รN (NGแปฎ ร‚...
Tแป”NG ร”N TแบฌP THI Vร€O LแปšP 10 Mร”N TIแบพNG ANH Nฤ‚M HแปŒC 2023 - 2024 Cร“ ฤรP รN (NGแปฎ ร‚...Tแป”NG ร”N TแบฌP THI Vร€O LแปšP 10 Mร”N TIแบพNG ANH Nฤ‚M HแปŒC 2023 - 2024 Cร“ ฤรP รN (NGแปฎ ร‚...
Tแป”NG ร”N TแบฌP THI Vร€O LแปšP 10 Mร”N TIแบพNG ANH Nฤ‚M HแปŒC 2023 - 2024 Cร“ ฤรP รN (NGแปฎ ร‚...
ย 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
ย 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
ย 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
ย 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
ย 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
ย 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
ย 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
ย 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
ย 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
ย 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
ย 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
ย 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
ย 

Data classification sammer

  • 1. 1 University of Technology Computer Science Department Data Classification UData Mining UPrepared by SAMMER A.QADER 2018
  • 2. 2 Contents: 1.Probabilistic Classification. 2.Naรฏve Bayes Classifier 3.Logistic Regression. 4.Support Vector Machine 5.Instance โ€“ Based Learning 6.References
  • 3. 3 1.UProbabilistic Classification. Probabilistic classifiers construct a model that quantifies the relationship between the feature variables and the target (class) variable as a probability. There are many ways in which such a modeling can be performed. Two of the most popular models are as follows: 1. Bayes classifier (generative classifier,) 2. Logistic regression (discriminative classifier,) U1. Bayes classifier The Bayes rule is used to model the probability of each value of the target variable for a given set of feature variables. It is assumed that the data points within a class are generated from a specific probability distribution such as โ€ข Bernoulli distribution โ€ข Multinomial distribution. A naive Bayes assumption of class-conditioned feature independence is often (but not always) used to simplify the modeling. 2. ULogistic regression The target variable is assumed to be drawn from a Bernoulli distribution whose mean is defined by a parameterized logit function on the feature variables. Thus, the probability distribution of the class variable is a parameterized function of the feature variables. This is in contrast to the Bayes model that assumes a specific generative model of the feature distribution of each class
  • 4. 4 2. Naรฏve Bayes Classifier: - โ€ข Naรฏve Bayes is a Supervised Learning Classifier. โ€ข Naรฏve Bayes classifiers are a family of simple probabilistic classifiers based on applying Bayes' theorem with strong (naive) independence assumptions between the features. ๏ถ A Naรฏve Bayesian Model is easy to build, with no complicated iterative parameter estimation which makes it useful for a very large datasets. ๏ถ Naรฏve Bayes Classifier is surprisingly well and it's widely used because it often outperforms more sophisticated classification methods. ๏ถ It is based on Frequency Table. How does it works.. โ€ข Bayes Theorem provides a way of calculating the Posterior Probability, P(c|x) from P(c), P(x), and P(x|c) . โ€ข Naรฏve Bayes Classifier assumes that the effect of the value of a predictor (x) on a given class (c) is independent of the values of other predictors; โ€ข this assumption is called Class Conditional Independence.
  • 5. 5 โ€ข P (c|x): Posterior Probability of class (target) given predictor (attribute). โ€ข P (x|c): is the Likelihood, which is the probability of the predictor given the class. โ€ข P(c): is the Prior Probability of the class (before seeing any data). โ€ข P(x): is the Prior probability of the predictor. Example of Naรฏve Bayes Classifier Id Outlook Temp Humidity Windy Play Tennis 1 Rainy Hot High False No 2 Rainy Hot High True No 3 Overcast Hot High false Yes 4 Sunny Mid High false Yes 5 Sunny Cool Normal false Yes 6 Sunny Cool Normal True No 7 Overcast Cool Normal True Yes 8 Rainy Mid High false No 9 Rainy Cool Normal false Yes 10 Sunny Mid Normal false Yes 11 Rainy Mid Normal True Yes 12 Overcast Mid High True Yes 13 Overcast Hot Normal false Yes 14 Sunny Mid High True No
  • 6. 6 โ€ข Frequency Tables: Table 1 Play tennis Yes No outlook Sunny 3/9 2/5 Overcast 4/9 0/5 Rainy 2/9 3/5 Table 3 Play tennis Yes No Humidity High 3/9 4/5 Normal 6/9 1/5 โ€ข Class Probability: Play Tennis P (Yes) 9/14 P (No) 5/14 โ€ข โ€ข Likelihood Tables Table 1 Play tennis Yes No outlook Sunny 3/9 2/5 5/14 Overcast 4/9 0/5 4/14 Rainy 2/9 3/5 5/14 Table 3 Play tennis Yes No Humidity High 3/9 4/5 7/14 Normal 6/9 1/5 7/14 Say that we want to calculate the Posterior Probability to the class (Yes) given (sunny) according to the P (C|X) previous equation: P (C|X) = P (X|C)*P(C)/P(X) P (Yes|Sunny) = P (Sunny|yes)*P(Yes)/P(sunny) = (3/9) * (9/14) / (5/14) = 0.33 * 0.64 / 0.36 = 0.60 Table 2 Play tennis Yes No Temp Hot 2/9 2/5 Mid 4/9 2/5 Cool 3/9 1/5 Table 4 Play tennis Yes No Windy False 6/9 2/5 True 3/9 3/5 Table 2 Play tennis Yes No Temp Hot 2/9 2/5 4/14 Mid 4/9 2/5 6/14 Cool 3/9 1/5 4/14 Table 4 Play tennis Yes No Windy False 6/9 2/5 8/14 True 3/9 3/5 6/14
  • 7. 7 Now let's assume the following data of a day: id Outlook Temp Humidity Windy Play Tennis Rainy Mid Normal True ? Likelihood of Yes = P(Outlook=Rainy|Yes) *P(Temp=Mid|Yes)P(Humidity=Normal|Yes)*P(Windy=True|Yes)*P(Yes) = 2/9 *4/9 *6/9 *3/9 * 9/14 = 0.014109347 Likelihood of No =P(Outlook=Rainy|No) *P(Temp=Mid|No)*P(Humidity=Normal|No)*P(Windy=True|No)*P(Yes) = 3/5 *2/5 *1/5 *3/5 * 5/14 = 0.010285714 Normalizing (dividing by the evidence) P (Yes) = 0.014109347/ (0.014109347+0.010285714) = 0.578368999 P (No) = 0.010285714/ (0.014109347+0.010285714) =0.421631001 P (Yes) > P (No) id Outlook Temp Humidity Windy Play Tennis Rainy Mid Normal True yes Since the evidence is constant and scales both posteriors equally. It therefore does not affect classification and can be ignored.
  • 8. 8 3. Logistic Regression Logistic regression is a regression model where the dependent variable (DV) is categorical. The output can take only two values, "0" and "1" (binary classification), which represent outcomes such as pass/fail, win/lose, alive/dead or healthy/sick. Idea 1: Let ๐‘๐‘(๐‘ฅ๐‘ฅ)be a linear function โ€ข W are estimating a probability, which must be between 0 and 1 โ€ข Linear functions are unbounded, so this approach doesnโ€™t work Better idea: โ€ข Set the odds ratio to a linear function: log๐‘œ๐‘œ๐‘‘๐‘‘๐‘‘๐‘‘๐‘ ๐‘ =๐‘™๐‘™๐‘œ๐‘œ ๐‘”๐‘”๐‘–๐‘–๐‘ก๐‘ก๐‘๐‘=ln๐‘๐‘1โˆ’๐‘๐‘=๐›ฝ๐›ฝ0+๐›ฝ๐›ฝ1 ๐‘ฅ๐‘ฅ Solving for p: ๐‘๐‘๐‘ฅ๐‘ฅ=๐‘’๐‘’๐›ฝ๐›ฝ0+๐›ฝ๐›ฝ1๐‘ฅ๐‘ฅ1+ ๐‘’๐‘’ ๐›ฝ๐›ฝ0+๐›ฝ๐›ฝ1๐‘ฅ๐‘ฅ= ๐Ÿ๐Ÿ/๐Ÿ๐Ÿ+ ๐’†๐’†โˆ’ (๐œท๐œท๐ŸŽ๐ŸŽ+๐œท๐œท๐Ÿ๐Ÿ ๐’™๐’™) โ€ข This is called the logistic (logit) function and it assumes values [0,1] โ€ข ๐›ฝ๐›ฝ0, ๐›ฝ๐›ฝ1, are estimated as the โ€˜log-oddsโ€™ of a unit change in the input feature it is associated with. ๏ถ Logit Function: Logistic regression is an estimate of a logit function. Is used to estimate probabilities of class membership instead of constructing a squared error objective here is how the logit function looks like: ๏ถ The core of logistic regression is the sigmoid function:
  • 9. 9 The sigmoid function wraps linear function y = mx+b or y= ๐›ฝ๐›ฝ0+๐›ฝ๐›ฝ1๐‘ฅ๐‘ฅ or to force the output to be between 0 and 1. The output can, therefore, be interpreted as a probability. Logistic function linear function To minimize misclassification rates, we predict: โ€ข ๐‘Œ๐‘Œ=1when ๐‘๐‘(๐‘ฅ๐‘ฅ)โ‰ฅ0.5 and ๐‘Œ๐‘Œ=0 when ๐‘๐‘(๐‘ฅ๐‘ฅ)<0.5 โ€ข So ๐‘Œ๐‘Œ=1when ๐›ฝ๐›ฝ0+๐›ฝ๐›ฝ1๐‘ฅ๐‘ฅ is non-negative and 0 otherwise โ€ขLogistic regression gives us a linear classifier where the decision boundary separating the two classes is the solution of ๐›ฝ๐›ฝ0+๐›ฝ๐›ฝ1๐‘ฅ๐‘ฅ=0 10.5.2.1 Training a Logistic Regression Classifier The maximum likelihood approach is used to estimate the best fitting parameters of the Logistic regression model In other meaning the parameters ๐›ฝ๐›ฝ0, ๐›ฝ๐›ฝ1, are estimated using a technique called Maximum likelihood estimation Logistic regression is similar to classical least-squares linear regression The difference that the logit function is used to estimate probabilities of class membership instead of constructing a squared error objective. Consequently, instead of the least-squares optimization in linear regression, a maximum likelihood optimization model is used for logistic regression
  • 10. 10 Example: Suppose that medical researchers are interested in exploring the relationship between patient age (x) and the presence (1) or absence (0) of a particular disease (y). The data collected from 20 patients are shown Letโ€™s examine the results of the logistic regression of disease on age, shown in Table 4.2. The coefficients, that is, the maximum likelihood estimates of the unknown Parameters ฮฒ0 and ฮฒ1, are given as : ๐›ฝ๐›ฝ0= โˆ’4.372 ๐›ฝ๐›ฝ1 = 0.06696. N Age Y 1 25 0 2 29 0 3 30 0 4 31 0 5 32 0 6 41 0 7 41 0 8 42 0 9 44 1 10 49 1 11 50 0 12 59 1 13 60 0 14 62 0 15 68 1 16 72 0 17 79 1 18 80 0 19 81 1 20 84 1 sum 1059 7
  • 11. 11 These equations may then be used to estimate the probability that the disease is present in a particular patient given the patientโ€™s age. For example, for a 50- year-old patient, we have Thus, the estimated probability that a 50-year-old patient has the disease is 26%, and the estimated probability that the disease is not present is 100% โˆ’ 26% = 74%. On the other hand, for a 72-year-old patient, we have The estimated probability that a 72-year-old patient has the disease is 61%, and the estimated probability that the disease is not present is 39%.
  • 12. 12 4. Linear Support Vector Machine in Mathematical Steps for Solution the LSVM 1. Find the maximum margin linear. 2. Determine the support vector. 3. Determine ( x ,y )for each support vector. 4. Here we will use vectors augmented with a 1 as a bias input, and for clarity we will differentiate these with an over-tilde. 5. Determine the class the each support vector belong it. If the group > 1 the class equals the 1, otherwise equal to -1. 6. Find the ฯi for each support vector by apply equation ฯi Si Sj+ ฯi Si+1 Sj+...........= (- or + 1 dependence of class (y)) For each support vector. 7. The hyper plane that discriminates the positive class from the negative class is given by:- ๐‘Š๐‘Š๏ฟฝ = โˆ‘ ๐›ผ๐›ผ๐‘–๐‘– ๐‘†๐‘†๐‘–๐‘– ๏ฟฝ 8. Our vectors are augmented with a bias. 9. Hence we can equate the entry in ๐‘ค๐‘ค as the hyper plane with an offset b. 10- Therefore the separating hyper plane equation ๐‘ฆ๐‘ฆ=๐‘ค๐‘ค๐‘ฅ๐‘ฅ+๐‘๐‘ Fig.4 (LSVM)
  • 13. 13 Example for LSVM: - Find the support vector machine? Solution:- ๏ถFind the maximum margin linear. ๏ถDetermine the support vector. Here we select 3 Support Vectors to start with.They are S1, S2 and S3. ๏ถDetermine ( x1, x2 )for each support vector. ๐‘†๐‘†1 = ๏ฟฝ 2 1 ๏ฟฝ , ๐‘†๐‘†2 = ๏ฟฝ 2 โˆ’1 ๏ฟฝ , ๐‘†๐‘†3 = ๏ฟฝ 4 0 ๏ฟฝ ๏ถHere we will use vectors augmented with a 1 as a bias input, and for clarity we will differentiate these with an over-tilde. ๐‘†๐‘†1 ๏ฟฝ = ๏ฟฝ 2 1 1 ๏ฟฝ , ๐‘†๐‘†2 ๏ฟฝ = ๏ฟฝ 2 โˆ’1 1 ๏ฟฝ , ๐‘†๐‘†3 ๏ฟฝ = ๏ฟฝ 4 0 1 ๏ฟฝ
  • 14. 14 ๏ถNow we need to find 3 parameters ๐›ผ๐›ผ1,๐›ผ๐›ผ2, and ๐›ผ๐›ผ3 based on the following 3 linear equations: Let's substitute the values for ๐‘†๐‘†1 ๏ฟฝ , ๐‘†๐‘†2 ๏ฟฝ , ๐‘Ž๐‘Ž๐‘Ž๐‘Ž๐‘Ž๐‘Ž ๐‘†๐‘†3 ๏ฟฝ in the above equation. ๐‘†๐‘†1 ๏ฟฝ = ๏ฟฝ 2 1 1 ๏ฟฝ , ๐‘†๐‘†2 ๏ฟฝ = ๏ฟฝ 2 โˆ’1 1 ๏ฟฝ , ๐‘†๐‘†3 ๏ฟฝ = ๏ฟฝ 4 0 1 ๏ฟฝ After simplification we get: Simplifying the above 3 simultaneous equations we get: ๐›ผ๐›ผ1 = ๐›ผ๐›ผ2 = โˆ’3.5 ๐‘Ž๐‘Ž๐‘Ž๐‘Ž๐‘Ž๐‘Ž ๐›ผ๐›ผ3 = 3.5 The hyperplane that discriminates the positive class from the negative class is given by: ๐‘Š๐‘Š๏ฟฝ = ๏ฟฝ ๐›ผ๐›ผ๐‘–๐‘– ๐‘†๐‘†๐‘–๐‘– ๏ฟฝ
  • 15. 15 Substituting the value we get: Therefore the separating hyper plane equation ๐‘ฆ๐‘ฆ=๐‘ค๐‘ค๐‘ฅ๐‘ฅ+๐‘๐‘ with ๐‘ค๐‘ค๏ฟฝ= 1 0 and offset ๐‘๐‘=โˆ’3.
  • 16. 16 Example2: Factory โ€œABCโ€ produces very precise high quality chip rings that their qualities are measured in term of curvature and diameter. Result of quality control by experts is given in the Table below curvature diameter Quality control result 2.947814 6.626878 Passed 2.530388 7.785050 Passed 3.566991 5.651046 Passed 3.156983 5.467077 Passed 2.582346 4.457777 Not-passed 2.155826 6.222343 Not-passed 3.273418 3.520687 Not-passed 2.8100 5.456782 ? The new chip rings have curvature 2.8100 and diameter 5.456782. Can you solve this problem by employing SVM? SOLUTION: In above example, we have training data consists of two numerical features, curvature and diameter. For each data, we also have predetermined groups: Passed or Not-Passed the manual quality control. We are going to create a model to classify the training data. 0 2 4 6 8 10 0 1 2 3 4 diameter curvature y= -1 y=+1
  • 17. 17 ๐‘บ๐‘บ๐Ÿ๐Ÿ = ๏ฟฝ ๐Ÿ‘๐Ÿ‘. ๐Ÿ๐Ÿ ๐Ÿ‘๐Ÿ‘. ๐Ÿ“๐Ÿ“ ๏ฟฝ ๐‘บ๐‘บ๐Ÿ๐Ÿ = ๏ฟฝ ๐Ÿ‘๐Ÿ‘. ๐Ÿ๐Ÿ ๐Ÿ“๐Ÿ“. ๐Ÿ’๐Ÿ’ ๏ฟฝ ๐‘บ๐‘บ๏ฟฝ ๐Ÿ๐Ÿ = ๏ฟฝ ๐Ÿ‘๐Ÿ‘. ๐Ÿ๐Ÿ ๐Ÿ‘๐Ÿ‘. ๐Ÿ“๐Ÿ“ ๐Ÿ๐Ÿ ๏ฟฝ ๐‘บ๐‘บ๏ฟฝ ๐Ÿ๐Ÿ = ๏ฟฝ ๐Ÿ‘๐Ÿ‘. ๐Ÿ๐Ÿ ๐Ÿ“๐Ÿ“. ๐Ÿ’๐Ÿ’ ๐Ÿ๐Ÿ ๏ฟฝ ๐œถ๐œถ๐Ÿ๐Ÿ ๐‘บ๐‘บ๐Ÿ๐Ÿ . ๐‘บ๐‘บ๐Ÿ๐Ÿ + ๐œถ๐œถ๐Ÿ๐Ÿ ๐‘บ๐‘บ๐Ÿ๐Ÿ. ๐‘บ๐‘บ๐Ÿ๐Ÿ = โˆ’๐Ÿ๐Ÿ ๐œถ๐œถ๐Ÿ๐Ÿ ๐‘บ๐‘บ๐Ÿ๐Ÿ . ๐‘บ๐‘บ๐Ÿ๐Ÿ + ๐œถ๐œถ๐Ÿ๐Ÿ ๐‘บ๐‘บ๐Ÿ๐Ÿ. ๐‘บ๐‘บ๐Ÿ๐Ÿ = ๐Ÿ๐Ÿ ๐œถ๐œถ๐Ÿ๐Ÿ ๏ฟฝ ๐Ÿ‘๐Ÿ‘. ๐Ÿ๐Ÿ ๐Ÿ‘๐Ÿ‘. ๐Ÿ“๐Ÿ“ ๐Ÿ๐Ÿ ๏ฟฝ . ๏ฟฝ ๐Ÿ‘๐Ÿ‘. ๐Ÿ๐Ÿ ๐Ÿ‘๐Ÿ‘. ๐Ÿ“๐Ÿ“ ๐Ÿ๐Ÿ ๏ฟฝ + ๐œถ๐œถ๐Ÿ๐Ÿ ๏ฟฝ ๐Ÿ‘๐Ÿ‘. ๐Ÿ๐Ÿ ๐Ÿ“๐Ÿ“. ๐Ÿ’๐Ÿ’ ๐Ÿ๐Ÿ ๏ฟฝ . ๏ฟฝ ๐Ÿ‘๐Ÿ‘. ๐Ÿ๐Ÿ ๐Ÿ‘๐Ÿ‘. ๐Ÿ“๐Ÿ“ ๐Ÿ๐Ÿ ๏ฟฝ = โˆ’๐Ÿ๐Ÿ ๐œถ๐œถ๐Ÿ๐Ÿ ๏ฟฝ ๐Ÿ‘๐Ÿ‘. ๐Ÿ๐Ÿ ๐Ÿ‘๐Ÿ‘. ๐Ÿ“๐Ÿ“ ๐Ÿ๐Ÿ ๏ฟฝ . ๏ฟฝ ๐Ÿ‘๐Ÿ‘. ๐Ÿ๐Ÿ ๐Ÿ“๐Ÿ“. ๐Ÿ’๐Ÿ’ ๐Ÿ๐Ÿ ๏ฟฝ + ๐œถ๐œถ๐Ÿ๐Ÿ ๏ฟฝ ๐Ÿ‘๐Ÿ‘. ๐Ÿ๐Ÿ ๐Ÿ“๐Ÿ“. ๐Ÿ’๐Ÿ’ ๐Ÿ๐Ÿ ๏ฟฝ . ๏ฟฝ ๐Ÿ‘๐Ÿ‘. ๐Ÿ๐Ÿ ๐Ÿ“๐Ÿ“. ๐Ÿ’๐Ÿ’ ๐Ÿ๐Ÿ ๏ฟฝ = ๐Ÿ๐Ÿ ๐Ÿ๐Ÿ๐Ÿ๐Ÿ. ๐Ÿ’๐Ÿ’ ๐œถ๐œถ๐Ÿ๐Ÿ + ๐Ÿ๐Ÿ๐Ÿ๐Ÿ. ๐Ÿ–๐Ÿ– ๐œถ๐œถ๐Ÿ๐Ÿ = โˆ’๐Ÿ๐Ÿ ๐Ÿ๐Ÿ๐Ÿ๐Ÿ. ๐Ÿ–๐Ÿ– ๐œถ๐œถ๐Ÿ๐Ÿ + ๐Ÿ‘๐Ÿ‘๐Ÿ‘๐Ÿ‘. ๐Ÿ•๐Ÿ• ๐œถ๐œถ๐Ÿ๐Ÿ = ๐Ÿ๐Ÿ ๐œถ๐œถ๐Ÿ๐Ÿ = โˆ’๐Ÿ๐Ÿ. ๐Ÿ”๐Ÿ” ๐œถ๐œถ๐Ÿ๐Ÿ = ๐Ÿ๐Ÿ. ๐Ÿ๐Ÿ๐Ÿ๐Ÿ ๐’˜๐’˜๏ฟฝ = ๏ฟฝ ๐œถ๐œถ๐’Š๐’Š ๐‘บ๐‘บ๐’Š๐’Š ๏ฟฝ ๐’˜๐’˜๏ฟฝ = (โˆ’๐Ÿ๐Ÿ. ๐Ÿ”๐Ÿ”). ๏ฟฝ ๐Ÿ๐Ÿ. ๐Ÿ‘๐Ÿ‘ ๐Ÿ‘๐Ÿ‘. ๐Ÿ“๐Ÿ“ ๐Ÿ๐Ÿ ๏ฟฝ + ( ๐Ÿ๐Ÿ. ๐Ÿ๐Ÿ๐Ÿ๐Ÿ). ๏ฟฝ ๐Ÿ‘๐Ÿ‘. ๐Ÿ๐Ÿ ๐Ÿ“๐Ÿ“. ๐Ÿ’๐Ÿ’ ๐Ÿ๐Ÿ ๏ฟฝ = ๏ฟฝ ๐ŸŽ๐ŸŽ. ๐Ÿ‘๐Ÿ‘ โˆ’๐Ÿ๐Ÿ. ๐Ÿ‘๐Ÿ‘ โˆ’๐ŸŽ๐ŸŽ. ๐Ÿ’๐Ÿ’ ๏ฟฝ To classify the new point (x1, x2) =(2.8,5.4) ๐’˜๐’˜. ๐’™๐’™ = ๏ฟฝ ๐ŸŽ๐ŸŽ. ๐Ÿ‘๐Ÿ‘ โˆ’๐Ÿ๐Ÿ. ๐Ÿ‘๐Ÿ‘ ๏ฟฝ . ๏ฟฝ ๐Ÿ๐Ÿ. ๐Ÿ–๐Ÿ– ๐Ÿ“๐Ÿ“. ๐Ÿ’๐Ÿ’ ๏ฟฝ = โˆ’๐Ÿ”๐Ÿ”. ๐Ÿ๐Ÿ < 1 The point belongs to the class -1 which means not-passed
  • 18. 18 5. Instance-Based Learning Most of the classifiers discussed in the previous sections are eager learners in which the classification model is constructed up front and then used to classify a specific test instance โ€ข In instance-based learning, the training is delayed until the last step of classification. Such classifiers are also referred to as lazy learners โ€ข The simplest principle to describe instance based learning is as follows: Similar instances have similar class labels. ๏ƒ˜ Different Learning Methods โ€ข Eager Learning โ€“ Learning = acquiring an explicit structure of a classifier on the whole training set; โ€“ Classification = an instance gets a classification using the explicit structure of the classifier. โ€ข Instance-Based Learning (Lazy Learning) โ€“ Learning = storing all training instances โ€“ Classification = an instance gets a classification equal to the classification of the nearest instances to the instance Similar instances have similar class labels.
  • 19. 19 5.1 Design Variations of Nearest Neighbor Classifiers Unsupervised Mahalanobis Metric The value of A is chosen to be the inverse of the d ร— d covariance matrix ฮฃ of the data set. The (i, j)th entry of the matrix ฮฃ is the covariance between the dimensions i and j. Therefore, the Mahalanobis distance is defined as follows: The Mahalanobis metric adjusts well to the different scaling of the dimensions and the Redundancies across different features. Even when the data is uncorrelated, the Mahalanobis metric is useful because it auto-scales for the naturally different ranges of attributes describing different physical quantities, How does Mahalanobis Metric works.. 1. We need to find the center matrix for each group 2. Then, we calculate the covariance matrix, which is calculated as follows: 3. The next step after creating the covariance matrices for group 1 and group 2 is to calculate the pooled covariance matrix 4. Finally to calculate the Mahalanobis distance by taking the square root of multiplication of the difference between the means of G1 and G2 by the inverse of pooled covariance matrix. Example: Group 1 Group 2 x1 y1 x2 y2 2 2 6 5 2 5 7 4 6 5 8 7 7 3 5 6 4 7 5 4 6 4 5 3 4 6 2 5 1 3 Mean X1 Y1 X2 Y2 3.9 4.3 6.2 5.2 Total data of group 1 = M 10 Total data of group 2 = N 5 Total data = q 15
  • 20. 20 1. We need to find the center matrix for each group, which can be calculated using the following formula: Center matrix X1= ๐‘ฟ๐‘ฟ๐‘ฟ๐‘ฟ โˆ’ ๐‘ฟ๐‘ฟ๏ฟฝ Center matrix Y1 = ๐’€๐’€๐’€๐’€ โˆ’ ๐’€๐’€๏ฟฝ The centered groups are: Group 1 Group 2 x1 y1 x2 y2 -1.90 -2.30 -0.20 -0.20 -1.90 0.70 0.80 -1.20 2.10 0.70 1.80 1.80 3.10 -1.30 -1.20 0.80 0.10 2.70 -1.20 -1.20 2.10 -0.30 1.10 -1.30 0.10 1.70 -1.90 0.70 -2.90 -1.30 2. Then, we calculate the covariance for group 1 and 2 matrix, which is calculated as follows: 1/n X.XT where n is the number of data points 1 ๐‘๐‘ X โŽฃ โŽข โŽข โŽข โŽข โŽข โŽข โŽข โŽข โŽก โˆ’๐Ÿ๐Ÿ. ๐Ÿ—๐Ÿ— โˆ’๐Ÿ๐Ÿ. ๐Ÿ‘๐Ÿ‘ โˆ’๐Ÿ๐Ÿ. ๐Ÿ—๐Ÿ— ๐ŸŽ๐ŸŽ. ๐Ÿ•๐Ÿ• ๐Ÿ๐Ÿ. ๐Ÿ๐Ÿ ๐ŸŽ๐ŸŽ. ๐Ÿ•๐Ÿ• ๐Ÿ‘๐Ÿ‘. ๐Ÿ๐Ÿ โˆ’๐Ÿ๐Ÿ. ๐Ÿ‘๐Ÿ‘ ๐ŸŽ๐ŸŽ. ๐Ÿ๐Ÿ ๐Ÿ๐Ÿ. ๐Ÿ•๐Ÿ• ๐Ÿ๐Ÿ. ๐Ÿ๐Ÿ โˆ’๐ŸŽ๐ŸŽ. ๐Ÿ‘๐Ÿ‘ ๐Ÿ๐Ÿ. ๐Ÿ๐Ÿ โˆ’๐Ÿ๐Ÿ. ๐Ÿ‘๐Ÿ‘ ๐ŸŽ๐ŸŽ. ๐Ÿ๐Ÿ ๐Ÿ๐Ÿ. ๐Ÿ•๐Ÿ• โˆ’๐Ÿ๐Ÿ. ๐Ÿ—๐Ÿ— ๐ŸŽ๐ŸŽ. ๐Ÿ•๐Ÿ• โˆ’๐Ÿ๐Ÿ. ๐Ÿ—๐Ÿ— โˆ’๐Ÿ๐Ÿ. ๐Ÿ‘๐Ÿ‘โŽฆ โŽฅ โŽฅ โŽฅ โŽฅ โŽฅ โŽฅ โŽฅ โŽฅ โŽค X ๏ฟฝ โˆ’๐Ÿ๐Ÿ. ๐Ÿ—๐Ÿ— โˆ’๐Ÿ๐Ÿ. ๐Ÿ—๐Ÿ— ๐Ÿ๐Ÿ. ๐Ÿ๐Ÿ ๐Ÿ‘๐Ÿ‘. ๐Ÿ๐Ÿ ๐ŸŽ๐ŸŽ. ๐Ÿ๐Ÿ ๐Ÿ๐Ÿ. ๐Ÿ๐Ÿ ๐Ÿ๐Ÿ. ๐Ÿ๐Ÿ ๐ŸŽ๐ŸŽ. ๐Ÿ๐Ÿ โˆ’๐Ÿ๐Ÿ. ๐Ÿ—๐Ÿ— โˆ’๐Ÿ๐Ÿ. ๐Ÿ—๐Ÿ— โˆ’๐Ÿ๐Ÿ. ๐Ÿ‘๐Ÿ‘ ๐ŸŽ๐ŸŽ. ๐Ÿ•๐Ÿ• ๐ŸŽ๐ŸŽ. ๐Ÿ•๐Ÿ• โˆ’๐Ÿ๐Ÿ. ๐Ÿ‘๐Ÿ‘ ๐Ÿ๐Ÿ. ๐Ÿ•๐Ÿ• โˆ’๐ŸŽ๐ŸŽ. ๐Ÿ‘๐Ÿ‘ โˆ’๐Ÿ๐Ÿ. ๐Ÿ‘๐Ÿ‘ ๐Ÿ๐Ÿ. ๐Ÿ•๐Ÿ• ๐ŸŽ๐ŸŽ. ๐Ÿ•๐Ÿ• โˆ’๐Ÿ๐Ÿ. ๐Ÿ‘๐Ÿ‘ ๏ฟฝT =
  • 21. 21 The result will be: Covariance of Group 1 x1 y1 x1 3.89 0.13 y1 0.13 2.21 Covariance of Group 2 x2 y2 x2 1.36 0.56 y2 0.56 1.36 3. The next step after creating the covariance matrices for group 1 and group 2 is to calculate the pooled covariance matrix Pooled Covariance Matrix x y x 3.05 0.27 y 0.27 1.93 4. Finally to calculate the Mahalanobis distance by taking the square root of multiplication of the difference between the means of G1 and G2 by the inverse of pooled covariance matrix.
  • 22. 22 Inverse Pooled Covariance matrix INVERS ๏ฟฝ 3.05 0.27 0.27 1.93 ๏ฟฝ = 1 (3.05โˆ—1.93)โˆ’(0.27โˆ—0.27) x ๏ฟฝ 1.93 โˆ’0.27 โˆ’0.27 3.05 ๏ฟฝ = ๏ฟฝ 0.332 โˆ’0.047 โˆ’0.047 0.526 ๏ฟฝ x Y x 0.332 -0.047 y -0.047 0.526 Mean difference (G1- G2) -2.3 ๐‘ฟ๐‘ฟ๐‘ฟ๐‘ฟ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ โˆ’ ๐‘ฟ๐‘ฟ๐‘ฟ๐‘ฟ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ = 3.9 - 6.2 = -2.3 -0.9 ๐’š๐’š๐’š๐’š๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ โˆ’ ๐’š๐’š๐’š๐’š๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ = 4.3 - 5.2 = -0.9 Mahalanobis distance = 1.41 ๏ฟฝ โˆ’2.3 โˆ’0.9 ๏ฟฝ x (โˆ’2.3 โˆ’0.9) x ๏ฟฝ 0.332 โˆ’0.047 โˆ’0.047 0.526 ๏ฟฝ = 1.41
  • 23. 23 References: Data mining // The Textbook by: Charu C. Aggarwal IBM T.J. Watson Research Center Yorktown Heights New York USA โ€ข https://www.autonlab.org/tutorials/mbl.html โ€ข https://en.wikipedia.org/wiki/Logistic_regression โ€ข http://people.revoledu.com/kardi/tutorial/Similarity/MahalanobisDistance.h tml โ€ข https://www.mathsisfun.com/algebra/matrix-multiplying.html