SlideShare a Scribd company logo
1 of 23
Download to read offline
1
University of Technology
Computer Science Department
Data Classification
UData Mining
UPrepared by
SAMMER A.QADER
2018
2
Contents:
1.Probabilistic Classification.
2.Naïve Bayes Classifier
3.Logistic Regression.
4.Support Vector Machine
5.Instance – Based Learning
6.References
3
1.UProbabilistic Classification.
Probabilistic classifiers construct a model that quantifies the relationship between the
feature variables and the target (class) variable as a probability.
There are many ways in which such a modeling can be performed. Two of the most
popular models are as follows:
1. Bayes classifier (generative classifier,)
2. Logistic regression (discriminative classifier,)
U1. Bayes classifier
The Bayes rule is used to model the probability of each value of the target variable for a
given set of feature variables.
It is assumed that the data points within a class are generated from a specific probability
distribution such as
• Bernoulli distribution
• Multinomial distribution.
A naive Bayes assumption of class-conditioned feature independence is often (but not
always) used to simplify the modeling.
2. ULogistic regression
The target variable is assumed to be drawn from a Bernoulli distribution whose mean is
defined by a parameterized logit function on the feature variables.
Thus, the probability distribution of the class variable is a parameterized function of the
feature variables. This is in contrast to the Bayes model that assumes a specific
generative model of the feature distribution of each class
4
2. Naïve Bayes Classifier: -
• Naïve Bayes is a Supervised Learning Classifier.
• Naïve Bayes classifiers are a family of simple probabilistic classifiers based on
applying Bayes' theorem with strong (naive) independence assumptions between
the features.
 A Naïve Bayesian Model is easy to build, with no complicated iterative
parameter estimation which makes it useful for a very large datasets.
 Naïve Bayes Classifier is surprisingly well and it's widely used because
it often outperforms more sophisticated classification methods.
 It is based on Frequency Table.
How does it works..
• Bayes Theorem provides a way of calculating the Posterior Probability, P(c|x) from
P(c), P(x), and P(x|c) .
• Naïve Bayes Classifier assumes that the effect of the value of a predictor (x) on a
given class (c) is independent of the values of other predictors;
• this assumption is called Class Conditional Independence.
5
• P (c|x): Posterior Probability of class (target) given predictor (attribute).
• P (x|c): is the Likelihood, which is the probability of the predictor given the class.
• P(c): is the Prior Probability of the class (before seeing any data).
• P(x): is the Prior probability of the predictor.
Example of Naïve Bayes Classifier 
Id Outlook Temp Humidity Windy Play Tennis
1 Rainy Hot High False No
2 Rainy Hot High True No
3 Overcast Hot High false Yes
4 Sunny Mid High false Yes
5 Sunny Cool Normal false Yes
6 Sunny Cool Normal True No
7 Overcast Cool Normal True Yes
8 Rainy Mid High false No
9 Rainy Cool Normal false Yes
10 Sunny Mid Normal false Yes
11 Rainy Mid Normal True Yes
12 Overcast Mid High True Yes
13 Overcast Hot Normal false Yes
14 Sunny Mid High True No
6
• Frequency Tables:
Table 1
Play tennis
Yes No
outlook
Sunny 3/9 2/5
Overcast 4/9 0/5
Rainy 2/9 3/5
Table 3
Play tennis
Yes No
Humidity
High 3/9 4/5
Normal 6/9 1/5
• Class Probability: Play Tennis
P (Yes) 9/14
P (No) 5/14
•
• Likelihood Tables
Table 1
Play tennis
Yes No
outlook
Sunny 3/9 2/5 5/14
Overcast 4/9 0/5 4/14
Rainy 2/9 3/5 5/14
Table 3
Play tennis
Yes No
Humidity
High 3/9 4/5 7/14
Normal 6/9 1/5 7/14
Say that we want to calculate the Posterior Probability to the class (Yes) given
(sunny) according to the P (C|X) previous equation:
P (C|X) = P (X|C)*P(C)/P(X)
P (Yes|Sunny) = P (Sunny|yes)*P(Yes)/P(sunny)
= (3/9) * (9/14) / (5/14)
= 0.33 * 0.64 / 0.36
= 0.60
Table 2
Play tennis
Yes No
Temp
Hot 2/9 2/5
Mid 4/9 2/5
Cool 3/9 1/5
Table 4
Play tennis
Yes No
Windy
False 6/9 2/5
True 3/9 3/5
Table 2
Play tennis
Yes No
Temp
Hot 2/9 2/5 4/14
Mid 4/9 2/5 6/14
Cool 3/9 1/5 4/14
Table 4
Play tennis
Yes No
Windy
False 6/9 2/5 8/14
True 3/9 3/5 6/14
7
Now let's assume the following data of a day:
id Outlook Temp Humidity Windy Play Tennis
Rainy Mid Normal True ?
Likelihood of Yes =
P(Outlook=Rainy|Yes)
*P(Temp=Mid|Yes)P(Humidity=Normal|Yes)*P(Windy=True|Yes)*P(Yes)
= 2/9 *4/9 *6/9 *3/9 * 9/14
= 0.014109347
Likelihood of No =P(Outlook=Rainy|No)
*P(Temp=Mid|No)*P(Humidity=Normal|No)*P(Windy=True|No)*P(Yes)
= 3/5 *2/5 *1/5 *3/5 * 5/14
= 0.010285714
Normalizing (dividing by the evidence)
P (Yes) = 0.014109347/ (0.014109347+0.010285714) = 0.578368999
P (No) = 0.010285714/ (0.014109347+0.010285714) =0.421631001
P (Yes) > P (No)
id Outlook Temp Humidity Windy Play Tennis
Rainy Mid Normal True yes
Since the evidence is constant and scales both posteriors equally. It therefore does not
affect classification and can be ignored.
8
3. Logistic Regression
Logistic regression is a regression model where the dependent variable (DV) is
categorical. The output can take only two values, "0" and "1" (binary classification),
which represent outcomes such as pass/fail, win/lose, alive/dead or healthy/sick.
Idea 1: Let 𝑝𝑝(𝑥𝑥)be a linear function
• W are estimating a probability, which must be between 0 and 1
• Linear functions are unbounded, so this approach doesn’t work
Better idea:
• Set the odds ratio to a linear function:
log𝑜𝑜𝑑𝑑𝑑𝑑𝑠𝑠=𝑙𝑙𝑜𝑜 𝑔𝑔𝑖𝑖𝑡𝑡𝑝𝑝=ln𝑝𝑝1−𝑝𝑝=𝛽𝛽0+𝛽𝛽1 𝑥𝑥
Solving for p:
𝑝𝑝𝑥𝑥=𝑒𝑒𝛽𝛽0+𝛽𝛽1𝑥𝑥1+ 𝑒𝑒 𝛽𝛽0+𝛽𝛽1𝑥𝑥= 𝟏𝟏/𝟏𝟏+ 𝒆𝒆− (𝜷𝜷𝟎𝟎+𝜷𝜷𝟏𝟏 𝒙𝒙)
• This is called the logistic (logit) function and it assumes values [0,1]
• 𝛽𝛽0, 𝛽𝛽1, are estimated as the ‘log-odds’ of a unit change in the input feature
it is associated with.
 Logit Function:
Logistic regression is an estimate of a logit function. Is used to estimate probabilities of
class membership instead of constructing a squared error objective here is how the logit
function looks like:
 The core of logistic regression is the sigmoid function:
9
The sigmoid function wraps linear function y = mx+b or y= 𝛽𝛽0+𝛽𝛽1𝑥𝑥 or to
force the output to be between 0 and 1. The output can, therefore, be interpreted as a
probability.
Logistic function linear function
To minimize misclassification rates, we predict:
• 𝑌𝑌=1when 𝑝𝑝(𝑥𝑥)≥0.5 and 𝑌𝑌=0 when 𝑝𝑝(𝑥𝑥)<0.5
• So 𝑌𝑌=1when 𝛽𝛽0+𝛽𝛽1𝑥𝑥 is non-negative and 0 otherwise
•Logistic regression gives us a linear classifier where the decision boundary
separating the two classes is the solution of 𝛽𝛽0+𝛽𝛽1𝑥𝑥=0
10.5.2.1 Training a Logistic Regression Classifier
The maximum likelihood approach is used to estimate the best fitting parameters of the
Logistic regression model
In other meaning the parameters 𝛽𝛽0, 𝛽𝛽1, are estimated using a technique called
Maximum likelihood estimation
Logistic regression is similar to classical least-squares linear regression
The difference that the logit function is used to estimate probabilities of class
membership instead of constructing a squared error objective. Consequently, instead
of the least-squares optimization in linear regression, a maximum likelihood
optimization model is used for logistic regression
10
Example: Suppose that medical researchers are interested in exploring the relationship
between patient age (x) and the presence (1) or absence (0) of a particular disease (y).
The data collected from 20 patients are shown
Let’s examine the results of the logistic regression of disease on age, shown in Table 4.2. The coefficients, that
is, the maximum likelihood estimates of the unknown Parameters β0 and β1, are given as :
𝛽𝛽0= −4.372
𝛽𝛽1 = 0.06696.
N Age Y
1 25 0
2 29 0
3 30 0
4 31 0
5 32 0
6 41 0
7 41 0
8 42 0
9 44 1
10 49 1
11 50 0
12 59 1
13 60 0
14 62 0
15 68 1
16 72 0
17 79 1
18 80 0
19 81 1
20 84 1
sum 1059 7
11
These equations may then be used to estimate the probability that the disease
is present in a particular patient given the patient’s age. For example, for a 50-
year-old patient, we have
Thus, the estimated probability that a 50-year-old patient has the disease is
26%, and the estimated probability that the disease is not present is 100% −
26% = 74%. On the other hand, for a 72-year-old patient, we have
The estimated probability that a 72-year-old patient has the disease is 61%, and
the estimated probability that the disease is not present is 39%.
12
4. Linear Support Vector Machine in Mathematical
Steps for Solution the LSVM
1. Find the maximum margin linear.
2. Determine the support vector.
3. Determine ( x ,y )for each support vector.
4. Here we will use vectors augmented with a 1 as a bias input, and for clarity we
will differentiate these with an over-tilde.
5. Determine the class the each support vector belong it. If the group > 1 the class
equals the 1, otherwise equal to -1.
6. Find the ρi for each support vector by apply equation
ρi Si Sj+ ρi Si+1 Sj+...........= (- or + 1 dependence of class (y))
For each support vector.
7. The hyper plane that discriminates the positive class from the negative class is
given by:- 𝑊𝑊� = ∑ 𝛼𝛼𝑖𝑖 𝑆𝑆𝑖𝑖
�
8. Our vectors are augmented with a bias.
9. Hence we can equate the entry in 𝑤𝑤 as the hyper plane with an offset b.
10- Therefore the separating hyper plane equation 𝑦𝑦=𝑤𝑤𝑥𝑥+𝑏𝑏
Fig.4 (LSVM)
13
Example for LSVM: - Find the support vector machine?
Solution:-
Find the maximum margin linear.
Determine the support vector.
Here we select 3 Support Vectors to start with.They are S1, S2 and S3.
Determine ( x1, x2 )for each support vector.
𝑆𝑆1 = �
2
1
� , 𝑆𝑆2 = �
2
−1
� , 𝑆𝑆3 = �
4
0
�
Here we will use vectors augmented with a 1 as a bias input, and for clarity we will
differentiate these with an over-tilde.
𝑆𝑆1
� = �
2
1
1
� , 𝑆𝑆2
� = �
2
−1
1
� , 𝑆𝑆3
� = �
4
0
1
�
14
Now we need to find 3 parameters 𝛼𝛼1,𝛼𝛼2, and 𝛼𝛼3 based on the following 3 linear
equations:
Let's substitute the values for 𝑆𝑆1
� , 𝑆𝑆2
� , 𝑎𝑎𝑎𝑎𝑎𝑎 𝑆𝑆3
� in the above equation.
𝑆𝑆1
� = �
2
1
1
� , 𝑆𝑆2
� = �
2
−1
1
� , 𝑆𝑆3
� = �
4
0
1
�
After simplification we get:
Simplifying the above 3 simultaneous equations we get:
𝛼𝛼1 = 𝛼𝛼2 = −3.5 𝑎𝑎𝑎𝑎𝑎𝑎 𝛼𝛼3 = 3.5
The hyperplane that discriminates the positive class from the negative class is given by:
𝑊𝑊� = � 𝛼𝛼𝑖𝑖 𝑆𝑆𝑖𝑖
�
15
Substituting the value we get:
Therefore the separating hyper plane equation 𝑦𝑦=𝑤𝑤𝑥𝑥+𝑏𝑏 with 𝑤𝑤�=
1
0
and offset 𝑏𝑏=−3.
16
Example2:
Factory “ABC” produces very precise high quality chip rings that their qualities are
measured in term of curvature and diameter. Result of quality control by experts is
given in the Table below
curvature diameter Quality control result
2.947814 6.626878 Passed
2.530388 7.785050 Passed
3.566991 5.651046 Passed
3.156983 5.467077 Passed
2.582346 4.457777 Not-passed
2.155826 6.222343 Not-passed
3.273418 3.520687 Not-passed
2.8100 5.456782 ?
The new chip rings have curvature 2.8100 and diameter 5.456782. Can you solve this
problem by employing SVM?
SOLUTION:
In above example, we have training data consists of two numerical features, curvature
and diameter. For each data, we also have predetermined groups: Passed or Not-Passed
the manual quality control. We are going to create a model to classify the training data.
0
2
4
6
8
10
0 1 2 3 4
diameter
curvature
y= -1
y=+1
17
𝑺𝑺𝟏𝟏 = �
𝟑𝟑. 𝟐𝟐
𝟑𝟑. 𝟓𝟓
� 𝑺𝑺𝟐𝟐 = �
𝟑𝟑. 𝟏𝟏
𝟓𝟓. 𝟒𝟒
�
𝑺𝑺� 𝟏𝟏 = �
𝟑𝟑. 𝟐𝟐
𝟑𝟑. 𝟓𝟓
𝟏𝟏
� 𝑺𝑺� 𝟐𝟐 = �
𝟑𝟑. 𝟏𝟏
𝟓𝟓. 𝟒𝟒
𝟏𝟏
�
𝜶𝜶𝟏𝟏 𝑺𝑺𝟏𝟏 . 𝑺𝑺𝟏𝟏 + 𝜶𝜶𝟐𝟐 𝑺𝑺𝟐𝟐. 𝑺𝑺𝟏𝟏 = −𝟏𝟏
𝜶𝜶𝟏𝟏 𝑺𝑺𝟏𝟏 . 𝑺𝑺𝟐𝟐 + 𝜶𝜶𝟐𝟐 𝑺𝑺𝟐𝟐. 𝑺𝑺𝟐𝟐 = 𝟏𝟏
𝜶𝜶𝟏𝟏 �
𝟑𝟑. 𝟐𝟐
𝟑𝟑. 𝟓𝟓
𝟏𝟏
� . �
𝟑𝟑. 𝟐𝟐
𝟑𝟑. 𝟓𝟓
𝟏𝟏
� + 𝜶𝜶𝟐𝟐 �
𝟑𝟑. 𝟏𝟏
𝟓𝟓. 𝟒𝟒
𝟏𝟏
� . �
𝟑𝟑. 𝟐𝟐
𝟑𝟑. 𝟓𝟓
𝟏𝟏
� = −𝟏𝟏
𝜶𝜶𝟏𝟏 �
𝟑𝟑. 𝟐𝟐
𝟑𝟑. 𝟓𝟓
𝟏𝟏
� . �
𝟑𝟑. 𝟏𝟏
𝟓𝟓. 𝟒𝟒
𝟏𝟏
� + 𝜶𝜶𝟐𝟐 �
𝟑𝟑. 𝟏𝟏
𝟓𝟓. 𝟒𝟒
𝟏𝟏
� . �
𝟑𝟑. 𝟏𝟏
𝟓𝟓. 𝟒𝟒
𝟏𝟏
� = 𝟏𝟏
𝟐𝟐𝟐𝟐. 𝟒𝟒 𝜶𝜶𝟏𝟏 + 𝟐𝟐𝟐𝟐. 𝟖𝟖 𝜶𝜶𝟐𝟐 = −𝟏𝟏
𝟐𝟐𝟐𝟐. 𝟖𝟖 𝜶𝜶𝟏𝟏 + 𝟑𝟑𝟑𝟑. 𝟕𝟕 𝜶𝜶𝟐𝟐 = 𝟏𝟏
𝜶𝜶𝟏𝟏 = −𝟏𝟏. 𝟔𝟔 𝜶𝜶𝟐𝟐 = 𝟏𝟏. 𝟐𝟐𝟐𝟐
𝒘𝒘� = � 𝜶𝜶𝒊𝒊 𝑺𝑺𝒊𝒊
�
𝒘𝒘� = (−𝟏𝟏. 𝟔𝟔). �
𝟐𝟐. 𝟑𝟑
𝟑𝟑. 𝟓𝟓
𝟏𝟏
� + ( 𝟏𝟏. 𝟐𝟐𝟐𝟐). �
𝟑𝟑. 𝟏𝟏
𝟓𝟓. 𝟒𝟒
𝟏𝟏
� = �
𝟎𝟎. 𝟑𝟑
−𝟏𝟏. 𝟑𝟑
−𝟎𝟎. 𝟒𝟒
�
To classify the new point (x1, x2) =(2.8,5.4)
𝒘𝒘. 𝒙𝒙 = �
𝟎𝟎. 𝟑𝟑
−𝟏𝟏. 𝟑𝟑
� . �
𝟐𝟐. 𝟖𝟖
𝟓𝟓. 𝟒𝟒
� = −𝟔𝟔. 𝟏𝟏 < 1
The point belongs to the class -1 which means not-passed
18
5. Instance-Based Learning
Most of the classifiers discussed in the previous sections are eager learners in which
the classification model is constructed up front and then used to classify a specific test
instance
• In instance-based learning, the training is delayed until the last step of
classification. Such classifiers are also referred to as lazy learners
• The simplest principle to describe instance based learning is as follows:
Similar instances have similar class labels.
 Different Learning Methods
• Eager Learning
– Learning = acquiring an explicit structure of a classifier on the whole
training set;
– Classification = an instance gets a classification using the explicit structure
of the classifier.
• Instance-Based Learning (Lazy Learning)
– Learning = storing all training instances
– Classification = an instance gets a classification equal to the classification
of the nearest instances to the instance
Similar instances have similar class labels.
19
5.1 Design Variations of Nearest Neighbor Classifiers
Unsupervised Mahalanobis Metric
The value of A is chosen to be the inverse of the d × d covariance matrix Σ of the data
set. The (i, j)th entry of the matrix Σ is the covariance between the dimensions i and j.
Therefore, the Mahalanobis distance is defined as follows:
The Mahalanobis metric adjusts well to the different scaling of the dimensions and the
Redundancies across different features. Even when the data is uncorrelated, the
Mahalanobis metric is useful because it auto-scales for the naturally different ranges of
attributes describing different physical quantities,
How does Mahalanobis Metric works..
1. We need to find the center matrix for each group
2. Then, we calculate the covariance matrix, which is calculated as follows:
3. The next step after creating the covariance matrices for group 1 and group 2 is
to calculate the pooled covariance matrix
4. Finally to calculate the Mahalanobis distance by taking the square root of
multiplication of the difference between the means of G1 and G2 by the inverse
of pooled covariance matrix.
Example:
Group 1 Group 2
x1 y1 x2 y2
2 2 6 5
2 5 7 4
6 5 8 7
7 3 5 6
4 7 5 4
6 4
5 3
4 6
2 5
1 3
Mean
X1 Y1 X2 Y2
3.9 4.3 6.2 5.2
Total data of group 1 = M 10
Total data of group 2 = N 5
Total data = q 15
20
1. We need to find the center matrix for each group, which can be calculated using
the following formula:
Center matrix X1= 𝑿𝑿𝑿𝑿 − 𝑿𝑿�
Center matrix Y1 = 𝒀𝒀𝒀𝒀 − 𝒀𝒀�
The centered groups are:
Group 1 Group 2
x1 y1 x2 y2
-1.90 -2.30 -0.20 -0.20
-1.90 0.70 0.80 -1.20
2.10 0.70 1.80 1.80
3.10 -1.30 -1.20 0.80
0.10 2.70 -1.20 -1.20
2.10 -0.30
1.10 -1.30
0.10 1.70
-1.90 0.70
-2.90 -1.30
2. Then, we calculate the covariance for group 1 and 2 matrix, which is calculated
as follows:
1/n X.XT
where n is the number of data points
1
𝑁𝑁
X
⎣
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎡
−𝟏𝟏. 𝟗𝟗 −𝟐𝟐. 𝟑𝟑
−𝟏𝟏. 𝟗𝟗 𝟎𝟎. 𝟕𝟕
𝟐𝟐. 𝟏𝟏 𝟎𝟎. 𝟕𝟕
𝟑𝟑. 𝟏𝟏 −𝟏𝟏. 𝟑𝟑
𝟎𝟎. 𝟏𝟏 𝟐𝟐. 𝟕𝟕
𝟐𝟐. 𝟏𝟏 −𝟎𝟎. 𝟑𝟑
𝟏𝟏. 𝟏𝟏 −𝟏𝟏. 𝟑𝟑
𝟎𝟎. 𝟏𝟏 𝟏𝟏. 𝟕𝟕
−𝟏𝟏. 𝟗𝟗 𝟎𝟎. 𝟕𝟕
−𝟐𝟐. 𝟗𝟗 −𝟏𝟏. 𝟑𝟑⎦
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎤
X �
−𝟏𝟏. 𝟗𝟗 −𝟏𝟏. 𝟗𝟗 𝟐𝟐. 𝟏𝟏 𝟑𝟑. 𝟏𝟏 𝟎𝟎. 𝟏𝟏 𝟐𝟐. 𝟏𝟏 𝟏𝟏. 𝟏𝟏 𝟎𝟎. 𝟏𝟏 −𝟏𝟏. 𝟗𝟗 −𝟐𝟐. 𝟗𝟗
−𝟐𝟐. 𝟑𝟑 𝟎𝟎. 𝟕𝟕 𝟎𝟎. 𝟕𝟕 −𝟏𝟏. 𝟑𝟑 𝟐𝟐. 𝟕𝟕 −𝟎𝟎. 𝟑𝟑 −𝟏𝟏. 𝟑𝟑 𝟏𝟏. 𝟕𝟕 𝟎𝟎. 𝟕𝟕 −𝟏𝟏. 𝟑𝟑
�T =
21
The result will be:
Covariance of Group 1
x1 y1
x1 3.89 0.13
y1 0.13 2.21
Covariance of Group 2
x2 y2
x2 1.36 0.56
y2 0.56 1.36
3. The next step after creating the covariance matrices for group 1 and group 2 is to
calculate the pooled covariance matrix
Pooled Covariance Matrix
x y
x 3.05 0.27
y 0.27 1.93
4. Finally to calculate the Mahalanobis distance by taking the square root of
multiplication of the difference between the means of G1 and G2 by the inverse of
pooled covariance matrix.
22
Inverse Pooled Covariance matrix
INVERS �
3.05 0.27
0.27 1.93
� =
1
(3.05∗1.93)−(0.27∗0.27)
x �
1.93 −0.27
−0.27 3.05
� = �
0.332 −0.047
−0.047 0.526
�
x Y
x 0.332 -0.047
y -0.047 0.526
Mean difference (G1- G2)
-2.3 𝑿𝑿𝑿𝑿���� − 𝑿𝑿𝑿𝑿���� = 3.9 - 6.2 = -2.3
-0.9 𝒚𝒚𝒚𝒚���� − 𝒚𝒚𝒚𝒚���� = 4.3 - 5.2 = -0.9
Mahalanobis distance
= 1.41
�
−2.3
−0.9
� x (−2.3 −0.9) x �
0.332 −0.047
−0.047 0.526
� = 1.41
23
References:
Data mining // The Textbook by:
Charu C. Aggarwal
IBM T.J. Watson Research Center
Yorktown Heights
New York USA
• https://www.autonlab.org/tutorials/mbl.html
• https://en.wikipedia.org/wiki/Logistic_regression
• http://people.revoledu.com/kardi/tutorial/Similarity/MahalanobisDistance.h
tml
• https://www.mathsisfun.com/algebra/matrix-multiplying.html

More Related Content

What's hot (20)

Chapter05
Chapter05Chapter05
Chapter05
 
Discrete and continuous probability distributions ppt @ bec doms
Discrete and continuous probability distributions ppt @ bec domsDiscrete and continuous probability distributions ppt @ bec doms
Discrete and continuous probability distributions ppt @ bec doms
 
The Standard Normal Distribution
The Standard Normal DistributionThe Standard Normal Distribution
The Standard Normal Distribution
 
Chapter7
Chapter7Chapter7
Chapter7
 
Chapter 4 part2- Random Variables
Chapter 4 part2- Random VariablesChapter 4 part2- Random Variables
Chapter 4 part2- Random Variables
 
Random variables
Random variablesRandom variables
Random variables
 
Chapter11
Chapter11Chapter11
Chapter11
 
Practice Test 2 Solutions
Practice Test 2  SolutionsPractice Test 2  Solutions
Practice Test 2 Solutions
 
Chapter14
Chapter14Chapter14
Chapter14
 
Assessing Normality
Assessing NormalityAssessing Normality
Assessing Normality
 
Discrete and Continuous Random Variables
Discrete and Continuous Random VariablesDiscrete and Continuous Random Variables
Discrete and Continuous Random Variables
 
Chapter14
Chapter14Chapter14
Chapter14
 
Chapter3
Chapter3Chapter3
Chapter3
 
Mean (Girja Pd. Patel)
Mean (Girja Pd. Patel)Mean (Girja Pd. Patel)
Mean (Girja Pd. Patel)
 
Chapter15
Chapter15Chapter15
Chapter15
 
Statistics and probability pp
Statistics and  probability ppStatistics and  probability pp
Statistics and probability pp
 
Practice test ch 10 correlation reg ch 11 gof ch12 anova
Practice test ch 10 correlation reg ch 11 gof ch12 anovaPractice test ch 10 correlation reg ch 11 gof ch12 anova
Practice test ch 10 correlation reg ch 11 gof ch12 anova
 
Chapter13
Chapter13Chapter13
Chapter13
 
Chapter08
Chapter08Chapter08
Chapter08
 
Chapter 12
Chapter 12Chapter 12
Chapter 12
 

Similar to Data classification sammer

maXbox starter67 machine learning V
maXbox starter67 machine learning VmaXbox starter67 machine learning V
maXbox starter67 machine learning VMax Kleiner
 
Calibrating Probability with Undersampling for Unbalanced Classification
Calibrating Probability with Undersampling for Unbalanced ClassificationCalibrating Probability with Undersampling for Unbalanced Classification
Calibrating Probability with Undersampling for Unbalanced ClassificationAndrea Dal Pozzolo
 
ISM_Session_5 _ 23rd and 24th December.pptx
ISM_Session_5 _ 23rd and 24th December.pptxISM_Session_5 _ 23rd and 24th December.pptx
ISM_Session_5 _ 23rd and 24th December.pptxssuser1eba67
 
Lesson 27 using statistical techniques in analyzing data
Lesson 27 using statistical techniques in analyzing dataLesson 27 using statistical techniques in analyzing data
Lesson 27 using statistical techniques in analyzing datamjlobetos
 
Big Data Analysis
Big Data AnalysisBig Data Analysis
Big Data AnalysisNBER
 
MODULE_05-Matrix Decomposition.pptx
MODULE_05-Matrix Decomposition.pptxMODULE_05-Matrix Decomposition.pptx
MODULE_05-Matrix Decomposition.pptxAlokSingh205089
 
Workshop 4
Workshop 4Workshop 4
Workshop 4eeetq
 
Model Selection and Validation
Model Selection and ValidationModel Selection and Validation
Model Selection and Validationgmorishita
 
One-Way ANOVA: Conceptual Foundations
One-Way ANOVA: Conceptual FoundationsOne-Way ANOVA: Conceptual Foundations
One-Way ANOVA: Conceptual Foundationssmackinnon
 
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...Maninda Edirisooriya
 
chap4_Parametric_Methods.ppt
chap4_Parametric_Methods.pptchap4_Parametric_Methods.ppt
chap4_Parametric_Methods.pptShayanChowdary
 
Application of Graphic LASSO in Portfolio Optimization_Yixuan Chen & Mengxi J...
Application of Graphic LASSO in Portfolio Optimization_Yixuan Chen & Mengxi J...Application of Graphic LASSO in Portfolio Optimization_Yixuan Chen & Mengxi J...
Application of Graphic LASSO in Portfolio Optimization_Yixuan Chen & Mengxi J...Mengxi Jiang
 

Similar to Data classification sammer (20)

maXbox starter67 machine learning V
maXbox starter67 machine learning VmaXbox starter67 machine learning V
maXbox starter67 machine learning V
 
Hmisiri nonparametrics book
Hmisiri nonparametrics bookHmisiri nonparametrics book
Hmisiri nonparametrics book
 
Naive Bayes Presentation
Naive Bayes PresentationNaive Bayes Presentation
Naive Bayes Presentation
 
Anova.ppt
Anova.pptAnova.ppt
Anova.ppt
 
Calibrating Probability with Undersampling for Unbalanced Classification
Calibrating Probability with Undersampling for Unbalanced ClassificationCalibrating Probability with Undersampling for Unbalanced Classification
Calibrating Probability with Undersampling for Unbalanced Classification
 
ISM_Session_5 _ 23rd and 24th December.pptx
ISM_Session_5 _ 23rd and 24th December.pptxISM_Session_5 _ 23rd and 24th December.pptx
ISM_Session_5 _ 23rd and 24th December.pptx
 
Lesson 27 using statistical techniques in analyzing data
Lesson 27 using statistical techniques in analyzing dataLesson 27 using statistical techniques in analyzing data
Lesson 27 using statistical techniques in analyzing data
 
Bisection method
Bisection methodBisection method
Bisection method
 
Big Data Analysis
Big Data AnalysisBig Data Analysis
Big Data Analysis
 
MODULE_05-Matrix Decomposition.pptx
MODULE_05-Matrix Decomposition.pptxMODULE_05-Matrix Decomposition.pptx
MODULE_05-Matrix Decomposition.pptx
 
Workshop 4
Workshop 4Workshop 4
Workshop 4
 
Linear regression
Linear regressionLinear regression
Linear regression
 
Model Selection and Validation
Model Selection and ValidationModel Selection and Validation
Model Selection and Validation
 
1624.pptx
1624.pptx1624.pptx
1624.pptx
 
JISA_Paper
JISA_PaperJISA_Paper
JISA_Paper
 
One-Way ANOVA: Conceptual Foundations
One-Way ANOVA: Conceptual FoundationsOne-Way ANOVA: Conceptual Foundations
One-Way ANOVA: Conceptual Foundations
 
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...
 
Gene expression profiling ii
Gene expression profiling  iiGene expression profiling  ii
Gene expression profiling ii
 
chap4_Parametric_Methods.ppt
chap4_Parametric_Methods.pptchap4_Parametric_Methods.ppt
chap4_Parametric_Methods.ppt
 
Application of Graphic LASSO in Portfolio Optimization_Yixuan Chen & Mengxi J...
Application of Graphic LASSO in Portfolio Optimization_Yixuan Chen & Mengxi J...Application of Graphic LASSO in Portfolio Optimization_Yixuan Chen & Mengxi J...
Application of Graphic LASSO in Portfolio Optimization_Yixuan Chen & Mengxi J...
 

More from Sammer Qader

Project integration management ch 4
Project integration management ch 4Project integration management ch 4
Project integration management ch 4Sammer Qader
 
Introduction to the management of information security
Introduction to the management of information security  Introduction to the management of information security
Introduction to the management of information security Sammer Qader
 
Regression analysis algorithm
Regression analysis algorithm Regression analysis algorithm
Regression analysis algorithm Sammer Qader
 
Information & Data Architecture
Information & Data ArchitectureInformation & Data Architecture
Information & Data ArchitectureSammer Qader
 
Chapter 4 Ethical and Social Issues in Information Systems
Chapter 4 Ethical and Social Issues in Information SystemsChapter 4 Ethical and Social Issues in Information Systems
Chapter 4 Ethical and Social Issues in Information SystemsSammer Qader
 
Project integration management ch 4
Project integration management ch 4Project integration management ch 4
Project integration management ch 4Sammer Qader
 

More from Sammer Qader (10)

Tam &amp; toe
Tam &amp; toeTam &amp; toe
Tam &amp; toe
 
Project integration management ch 4
Project integration management ch 4Project integration management ch 4
Project integration management ch 4
 
Introduction to the management of information security
Introduction to the management of information security  Introduction to the management of information security
Introduction to the management of information security
 
Text compression
Text compressionText compression
Text compression
 
Regression analysis algorithm
Regression analysis algorithm Regression analysis algorithm
Regression analysis algorithm
 
Transport laye
Transport laye Transport laye
Transport laye
 
Information & Data Architecture
Information & Data ArchitectureInformation & Data Architecture
Information & Data Architecture
 
Chapter 4 Ethical and Social Issues in Information Systems
Chapter 4 Ethical and Social Issues in Information SystemsChapter 4 Ethical and Social Issues in Information Systems
Chapter 4 Ethical and Social Issues in Information Systems
 
Project integration management ch 4
Project integration management ch 4Project integration management ch 4
Project integration management ch 4
 
Cloud computing
Cloud computingCloud computing
Cloud computing
 

Recently uploaded

Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 

Recently uploaded (20)

Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 

Data classification sammer

  • 1. 1 University of Technology Computer Science Department Data Classification UData Mining UPrepared by SAMMER A.QADER 2018
  • 2. 2 Contents: 1.Probabilistic Classification. 2.Naïve Bayes Classifier 3.Logistic Regression. 4.Support Vector Machine 5.Instance – Based Learning 6.References
  • 3. 3 1.UProbabilistic Classification. Probabilistic classifiers construct a model that quantifies the relationship between the feature variables and the target (class) variable as a probability. There are many ways in which such a modeling can be performed. Two of the most popular models are as follows: 1. Bayes classifier (generative classifier,) 2. Logistic regression (discriminative classifier,) U1. Bayes classifier The Bayes rule is used to model the probability of each value of the target variable for a given set of feature variables. It is assumed that the data points within a class are generated from a specific probability distribution such as • Bernoulli distribution • Multinomial distribution. A naive Bayes assumption of class-conditioned feature independence is often (but not always) used to simplify the modeling. 2. ULogistic regression The target variable is assumed to be drawn from a Bernoulli distribution whose mean is defined by a parameterized logit function on the feature variables. Thus, the probability distribution of the class variable is a parameterized function of the feature variables. This is in contrast to the Bayes model that assumes a specific generative model of the feature distribution of each class
  • 4. 4 2. Naïve Bayes Classifier: - • Naïve Bayes is a Supervised Learning Classifier. • Naïve Bayes classifiers are a family of simple probabilistic classifiers based on applying Bayes' theorem with strong (naive) independence assumptions between the features.  A Naïve Bayesian Model is easy to build, with no complicated iterative parameter estimation which makes it useful for a very large datasets.  Naïve Bayes Classifier is surprisingly well and it's widely used because it often outperforms more sophisticated classification methods.  It is based on Frequency Table. How does it works.. • Bayes Theorem provides a way of calculating the Posterior Probability, P(c|x) from P(c), P(x), and P(x|c) . • Naïve Bayes Classifier assumes that the effect of the value of a predictor (x) on a given class (c) is independent of the values of other predictors; • this assumption is called Class Conditional Independence.
  • 5. 5 • P (c|x): Posterior Probability of class (target) given predictor (attribute). • P (x|c): is the Likelihood, which is the probability of the predictor given the class. • P(c): is the Prior Probability of the class (before seeing any data). • P(x): is the Prior probability of the predictor. Example of Naïve Bayes Classifier Id Outlook Temp Humidity Windy Play Tennis 1 Rainy Hot High False No 2 Rainy Hot High True No 3 Overcast Hot High false Yes 4 Sunny Mid High false Yes 5 Sunny Cool Normal false Yes 6 Sunny Cool Normal True No 7 Overcast Cool Normal True Yes 8 Rainy Mid High false No 9 Rainy Cool Normal false Yes 10 Sunny Mid Normal false Yes 11 Rainy Mid Normal True Yes 12 Overcast Mid High True Yes 13 Overcast Hot Normal false Yes 14 Sunny Mid High True No
  • 6. 6 • Frequency Tables: Table 1 Play tennis Yes No outlook Sunny 3/9 2/5 Overcast 4/9 0/5 Rainy 2/9 3/5 Table 3 Play tennis Yes No Humidity High 3/9 4/5 Normal 6/9 1/5 • Class Probability: Play Tennis P (Yes) 9/14 P (No) 5/14 • • Likelihood Tables Table 1 Play tennis Yes No outlook Sunny 3/9 2/5 5/14 Overcast 4/9 0/5 4/14 Rainy 2/9 3/5 5/14 Table 3 Play tennis Yes No Humidity High 3/9 4/5 7/14 Normal 6/9 1/5 7/14 Say that we want to calculate the Posterior Probability to the class (Yes) given (sunny) according to the P (C|X) previous equation: P (C|X) = P (X|C)*P(C)/P(X) P (Yes|Sunny) = P (Sunny|yes)*P(Yes)/P(sunny) = (3/9) * (9/14) / (5/14) = 0.33 * 0.64 / 0.36 = 0.60 Table 2 Play tennis Yes No Temp Hot 2/9 2/5 Mid 4/9 2/5 Cool 3/9 1/5 Table 4 Play tennis Yes No Windy False 6/9 2/5 True 3/9 3/5 Table 2 Play tennis Yes No Temp Hot 2/9 2/5 4/14 Mid 4/9 2/5 6/14 Cool 3/9 1/5 4/14 Table 4 Play tennis Yes No Windy False 6/9 2/5 8/14 True 3/9 3/5 6/14
  • 7. 7 Now let's assume the following data of a day: id Outlook Temp Humidity Windy Play Tennis Rainy Mid Normal True ? Likelihood of Yes = P(Outlook=Rainy|Yes) *P(Temp=Mid|Yes)P(Humidity=Normal|Yes)*P(Windy=True|Yes)*P(Yes) = 2/9 *4/9 *6/9 *3/9 * 9/14 = 0.014109347 Likelihood of No =P(Outlook=Rainy|No) *P(Temp=Mid|No)*P(Humidity=Normal|No)*P(Windy=True|No)*P(Yes) = 3/5 *2/5 *1/5 *3/5 * 5/14 = 0.010285714 Normalizing (dividing by the evidence) P (Yes) = 0.014109347/ (0.014109347+0.010285714) = 0.578368999 P (No) = 0.010285714/ (0.014109347+0.010285714) =0.421631001 P (Yes) > P (No) id Outlook Temp Humidity Windy Play Tennis Rainy Mid Normal True yes Since the evidence is constant and scales both posteriors equally. It therefore does not affect classification and can be ignored.
  • 8. 8 3. Logistic Regression Logistic regression is a regression model where the dependent variable (DV) is categorical. The output can take only two values, "0" and "1" (binary classification), which represent outcomes such as pass/fail, win/lose, alive/dead or healthy/sick. Idea 1: Let 𝑝𝑝(𝑥𝑥)be a linear function • W are estimating a probability, which must be between 0 and 1 • Linear functions are unbounded, so this approach doesn’t work Better idea: • Set the odds ratio to a linear function: log𝑜𝑜𝑑𝑑𝑑𝑑𝑠𝑠=𝑙𝑙𝑜𝑜 𝑔𝑔𝑖𝑖𝑡𝑡𝑝𝑝=ln𝑝𝑝1−𝑝𝑝=𝛽𝛽0+𝛽𝛽1 𝑥𝑥 Solving for p: 𝑝𝑝𝑥𝑥=𝑒𝑒𝛽𝛽0+𝛽𝛽1𝑥𝑥1+ 𝑒𝑒 𝛽𝛽0+𝛽𝛽1𝑥𝑥= 𝟏𝟏/𝟏𝟏+ 𝒆𝒆− (𝜷𝜷𝟎𝟎+𝜷𝜷𝟏𝟏 𝒙𝒙) • This is called the logistic (logit) function and it assumes values [0,1] • 𝛽𝛽0, 𝛽𝛽1, are estimated as the ‘log-odds’ of a unit change in the input feature it is associated with.  Logit Function: Logistic regression is an estimate of a logit function. Is used to estimate probabilities of class membership instead of constructing a squared error objective here is how the logit function looks like:  The core of logistic regression is the sigmoid function:
  • 9. 9 The sigmoid function wraps linear function y = mx+b or y= 𝛽𝛽0+𝛽𝛽1𝑥𝑥 or to force the output to be between 0 and 1. The output can, therefore, be interpreted as a probability. Logistic function linear function To minimize misclassification rates, we predict: • 𝑌𝑌=1when 𝑝𝑝(𝑥𝑥)≥0.5 and 𝑌𝑌=0 when 𝑝𝑝(𝑥𝑥)<0.5 • So 𝑌𝑌=1when 𝛽𝛽0+𝛽𝛽1𝑥𝑥 is non-negative and 0 otherwise •Logistic regression gives us a linear classifier where the decision boundary separating the two classes is the solution of 𝛽𝛽0+𝛽𝛽1𝑥𝑥=0 10.5.2.1 Training a Logistic Regression Classifier The maximum likelihood approach is used to estimate the best fitting parameters of the Logistic regression model In other meaning the parameters 𝛽𝛽0, 𝛽𝛽1, are estimated using a technique called Maximum likelihood estimation Logistic regression is similar to classical least-squares linear regression The difference that the logit function is used to estimate probabilities of class membership instead of constructing a squared error objective. Consequently, instead of the least-squares optimization in linear regression, a maximum likelihood optimization model is used for logistic regression
  • 10. 10 Example: Suppose that medical researchers are interested in exploring the relationship between patient age (x) and the presence (1) or absence (0) of a particular disease (y). The data collected from 20 patients are shown Let’s examine the results of the logistic regression of disease on age, shown in Table 4.2. The coefficients, that is, the maximum likelihood estimates of the unknown Parameters β0 and β1, are given as : 𝛽𝛽0= −4.372 𝛽𝛽1 = 0.06696. N Age Y 1 25 0 2 29 0 3 30 0 4 31 0 5 32 0 6 41 0 7 41 0 8 42 0 9 44 1 10 49 1 11 50 0 12 59 1 13 60 0 14 62 0 15 68 1 16 72 0 17 79 1 18 80 0 19 81 1 20 84 1 sum 1059 7
  • 11. 11 These equations may then be used to estimate the probability that the disease is present in a particular patient given the patient’s age. For example, for a 50- year-old patient, we have Thus, the estimated probability that a 50-year-old patient has the disease is 26%, and the estimated probability that the disease is not present is 100% − 26% = 74%. On the other hand, for a 72-year-old patient, we have The estimated probability that a 72-year-old patient has the disease is 61%, and the estimated probability that the disease is not present is 39%.
  • 12. 12 4. Linear Support Vector Machine in Mathematical Steps for Solution the LSVM 1. Find the maximum margin linear. 2. Determine the support vector. 3. Determine ( x ,y )for each support vector. 4. Here we will use vectors augmented with a 1 as a bias input, and for clarity we will differentiate these with an over-tilde. 5. Determine the class the each support vector belong it. If the group > 1 the class equals the 1, otherwise equal to -1. 6. Find the ρi for each support vector by apply equation ρi Si Sj+ ρi Si+1 Sj+...........= (- or + 1 dependence of class (y)) For each support vector. 7. The hyper plane that discriminates the positive class from the negative class is given by:- 𝑊𝑊� = ∑ 𝛼𝛼𝑖𝑖 𝑆𝑆𝑖𝑖 � 8. Our vectors are augmented with a bias. 9. Hence we can equate the entry in 𝑤𝑤 as the hyper plane with an offset b. 10- Therefore the separating hyper plane equation 𝑦𝑦=𝑤𝑤𝑥𝑥+𝑏𝑏 Fig.4 (LSVM)
  • 13. 13 Example for LSVM: - Find the support vector machine? Solution:- Find the maximum margin linear. Determine the support vector. Here we select 3 Support Vectors to start with.They are S1, S2 and S3. Determine ( x1, x2 )for each support vector. 𝑆𝑆1 = � 2 1 � , 𝑆𝑆2 = � 2 −1 � , 𝑆𝑆3 = � 4 0 � Here we will use vectors augmented with a 1 as a bias input, and for clarity we will differentiate these with an over-tilde. 𝑆𝑆1 � = � 2 1 1 � , 𝑆𝑆2 � = � 2 −1 1 � , 𝑆𝑆3 � = � 4 0 1 �
  • 14. 14 Now we need to find 3 parameters 𝛼𝛼1,𝛼𝛼2, and 𝛼𝛼3 based on the following 3 linear equations: Let's substitute the values for 𝑆𝑆1 � , 𝑆𝑆2 � , 𝑎𝑎𝑎𝑎𝑎𝑎 𝑆𝑆3 � in the above equation. 𝑆𝑆1 � = � 2 1 1 � , 𝑆𝑆2 � = � 2 −1 1 � , 𝑆𝑆3 � = � 4 0 1 � After simplification we get: Simplifying the above 3 simultaneous equations we get: 𝛼𝛼1 = 𝛼𝛼2 = −3.5 𝑎𝑎𝑎𝑎𝑎𝑎 𝛼𝛼3 = 3.5 The hyperplane that discriminates the positive class from the negative class is given by: 𝑊𝑊� = � 𝛼𝛼𝑖𝑖 𝑆𝑆𝑖𝑖 �
  • 15. 15 Substituting the value we get: Therefore the separating hyper plane equation 𝑦𝑦=𝑤𝑤𝑥𝑥+𝑏𝑏 with 𝑤𝑤�= 1 0 and offset 𝑏𝑏=−3.
  • 16. 16 Example2: Factory “ABC” produces very precise high quality chip rings that their qualities are measured in term of curvature and diameter. Result of quality control by experts is given in the Table below curvature diameter Quality control result 2.947814 6.626878 Passed 2.530388 7.785050 Passed 3.566991 5.651046 Passed 3.156983 5.467077 Passed 2.582346 4.457777 Not-passed 2.155826 6.222343 Not-passed 3.273418 3.520687 Not-passed 2.8100 5.456782 ? The new chip rings have curvature 2.8100 and diameter 5.456782. Can you solve this problem by employing SVM? SOLUTION: In above example, we have training data consists of two numerical features, curvature and diameter. For each data, we also have predetermined groups: Passed or Not-Passed the manual quality control. We are going to create a model to classify the training data. 0 2 4 6 8 10 0 1 2 3 4 diameter curvature y= -1 y=+1
  • 17. 17 𝑺𝑺𝟏𝟏 = � 𝟑𝟑. 𝟐𝟐 𝟑𝟑. 𝟓𝟓 � 𝑺𝑺𝟐𝟐 = � 𝟑𝟑. 𝟏𝟏 𝟓𝟓. 𝟒𝟒 � 𝑺𝑺� 𝟏𝟏 = � 𝟑𝟑. 𝟐𝟐 𝟑𝟑. 𝟓𝟓 𝟏𝟏 � 𝑺𝑺� 𝟐𝟐 = � 𝟑𝟑. 𝟏𝟏 𝟓𝟓. 𝟒𝟒 𝟏𝟏 � 𝜶𝜶𝟏𝟏 𝑺𝑺𝟏𝟏 . 𝑺𝑺𝟏𝟏 + 𝜶𝜶𝟐𝟐 𝑺𝑺𝟐𝟐. 𝑺𝑺𝟏𝟏 = −𝟏𝟏 𝜶𝜶𝟏𝟏 𝑺𝑺𝟏𝟏 . 𝑺𝑺𝟐𝟐 + 𝜶𝜶𝟐𝟐 𝑺𝑺𝟐𝟐. 𝑺𝑺𝟐𝟐 = 𝟏𝟏 𝜶𝜶𝟏𝟏 � 𝟑𝟑. 𝟐𝟐 𝟑𝟑. 𝟓𝟓 𝟏𝟏 � . � 𝟑𝟑. 𝟐𝟐 𝟑𝟑. 𝟓𝟓 𝟏𝟏 � + 𝜶𝜶𝟐𝟐 � 𝟑𝟑. 𝟏𝟏 𝟓𝟓. 𝟒𝟒 𝟏𝟏 � . � 𝟑𝟑. 𝟐𝟐 𝟑𝟑. 𝟓𝟓 𝟏𝟏 � = −𝟏𝟏 𝜶𝜶𝟏𝟏 � 𝟑𝟑. 𝟐𝟐 𝟑𝟑. 𝟓𝟓 𝟏𝟏 � . � 𝟑𝟑. 𝟏𝟏 𝟓𝟓. 𝟒𝟒 𝟏𝟏 � + 𝜶𝜶𝟐𝟐 � 𝟑𝟑. 𝟏𝟏 𝟓𝟓. 𝟒𝟒 𝟏𝟏 � . � 𝟑𝟑. 𝟏𝟏 𝟓𝟓. 𝟒𝟒 𝟏𝟏 � = 𝟏𝟏 𝟐𝟐𝟐𝟐. 𝟒𝟒 𝜶𝜶𝟏𝟏 + 𝟐𝟐𝟐𝟐. 𝟖𝟖 𝜶𝜶𝟐𝟐 = −𝟏𝟏 𝟐𝟐𝟐𝟐. 𝟖𝟖 𝜶𝜶𝟏𝟏 + 𝟑𝟑𝟑𝟑. 𝟕𝟕 𝜶𝜶𝟐𝟐 = 𝟏𝟏 𝜶𝜶𝟏𝟏 = −𝟏𝟏. 𝟔𝟔 𝜶𝜶𝟐𝟐 = 𝟏𝟏. 𝟐𝟐𝟐𝟐 𝒘𝒘� = � 𝜶𝜶𝒊𝒊 𝑺𝑺𝒊𝒊 � 𝒘𝒘� = (−𝟏𝟏. 𝟔𝟔). � 𝟐𝟐. 𝟑𝟑 𝟑𝟑. 𝟓𝟓 𝟏𝟏 � + ( 𝟏𝟏. 𝟐𝟐𝟐𝟐). � 𝟑𝟑. 𝟏𝟏 𝟓𝟓. 𝟒𝟒 𝟏𝟏 � = � 𝟎𝟎. 𝟑𝟑 −𝟏𝟏. 𝟑𝟑 −𝟎𝟎. 𝟒𝟒 � To classify the new point (x1, x2) =(2.8,5.4) 𝒘𝒘. 𝒙𝒙 = � 𝟎𝟎. 𝟑𝟑 −𝟏𝟏. 𝟑𝟑 � . � 𝟐𝟐. 𝟖𝟖 𝟓𝟓. 𝟒𝟒 � = −𝟔𝟔. 𝟏𝟏 < 1 The point belongs to the class -1 which means not-passed
  • 18. 18 5. Instance-Based Learning Most of the classifiers discussed in the previous sections are eager learners in which the classification model is constructed up front and then used to classify a specific test instance • In instance-based learning, the training is delayed until the last step of classification. Such classifiers are also referred to as lazy learners • The simplest principle to describe instance based learning is as follows: Similar instances have similar class labels.  Different Learning Methods • Eager Learning – Learning = acquiring an explicit structure of a classifier on the whole training set; – Classification = an instance gets a classification using the explicit structure of the classifier. • Instance-Based Learning (Lazy Learning) – Learning = storing all training instances – Classification = an instance gets a classification equal to the classification of the nearest instances to the instance Similar instances have similar class labels.
  • 19. 19 5.1 Design Variations of Nearest Neighbor Classifiers Unsupervised Mahalanobis Metric The value of A is chosen to be the inverse of the d × d covariance matrix Σ of the data set. The (i, j)th entry of the matrix Σ is the covariance between the dimensions i and j. Therefore, the Mahalanobis distance is defined as follows: The Mahalanobis metric adjusts well to the different scaling of the dimensions and the Redundancies across different features. Even when the data is uncorrelated, the Mahalanobis metric is useful because it auto-scales for the naturally different ranges of attributes describing different physical quantities, How does Mahalanobis Metric works.. 1. We need to find the center matrix for each group 2. Then, we calculate the covariance matrix, which is calculated as follows: 3. The next step after creating the covariance matrices for group 1 and group 2 is to calculate the pooled covariance matrix 4. Finally to calculate the Mahalanobis distance by taking the square root of multiplication of the difference between the means of G1 and G2 by the inverse of pooled covariance matrix. Example: Group 1 Group 2 x1 y1 x2 y2 2 2 6 5 2 5 7 4 6 5 8 7 7 3 5 6 4 7 5 4 6 4 5 3 4 6 2 5 1 3 Mean X1 Y1 X2 Y2 3.9 4.3 6.2 5.2 Total data of group 1 = M 10 Total data of group 2 = N 5 Total data = q 15
  • 20. 20 1. We need to find the center matrix for each group, which can be calculated using the following formula: Center matrix X1= 𝑿𝑿𝑿𝑿 − 𝑿𝑿� Center matrix Y1 = 𝒀𝒀𝒀𝒀 − 𝒀𝒀� The centered groups are: Group 1 Group 2 x1 y1 x2 y2 -1.90 -2.30 -0.20 -0.20 -1.90 0.70 0.80 -1.20 2.10 0.70 1.80 1.80 3.10 -1.30 -1.20 0.80 0.10 2.70 -1.20 -1.20 2.10 -0.30 1.10 -1.30 0.10 1.70 -1.90 0.70 -2.90 -1.30 2. Then, we calculate the covariance for group 1 and 2 matrix, which is calculated as follows: 1/n X.XT where n is the number of data points 1 𝑁𝑁 X ⎣ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎡ −𝟏𝟏. 𝟗𝟗 −𝟐𝟐. 𝟑𝟑 −𝟏𝟏. 𝟗𝟗 𝟎𝟎. 𝟕𝟕 𝟐𝟐. 𝟏𝟏 𝟎𝟎. 𝟕𝟕 𝟑𝟑. 𝟏𝟏 −𝟏𝟏. 𝟑𝟑 𝟎𝟎. 𝟏𝟏 𝟐𝟐. 𝟕𝟕 𝟐𝟐. 𝟏𝟏 −𝟎𝟎. 𝟑𝟑 𝟏𝟏. 𝟏𝟏 −𝟏𝟏. 𝟑𝟑 𝟎𝟎. 𝟏𝟏 𝟏𝟏. 𝟕𝟕 −𝟏𝟏. 𝟗𝟗 𝟎𝟎. 𝟕𝟕 −𝟐𝟐. 𝟗𝟗 −𝟏𝟏. 𝟑𝟑⎦ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎤ X � −𝟏𝟏. 𝟗𝟗 −𝟏𝟏. 𝟗𝟗 𝟐𝟐. 𝟏𝟏 𝟑𝟑. 𝟏𝟏 𝟎𝟎. 𝟏𝟏 𝟐𝟐. 𝟏𝟏 𝟏𝟏. 𝟏𝟏 𝟎𝟎. 𝟏𝟏 −𝟏𝟏. 𝟗𝟗 −𝟐𝟐. 𝟗𝟗 −𝟐𝟐. 𝟑𝟑 𝟎𝟎. 𝟕𝟕 𝟎𝟎. 𝟕𝟕 −𝟏𝟏. 𝟑𝟑 𝟐𝟐. 𝟕𝟕 −𝟎𝟎. 𝟑𝟑 −𝟏𝟏. 𝟑𝟑 𝟏𝟏. 𝟕𝟕 𝟎𝟎. 𝟕𝟕 −𝟏𝟏. 𝟑𝟑 �T =
  • 21. 21 The result will be: Covariance of Group 1 x1 y1 x1 3.89 0.13 y1 0.13 2.21 Covariance of Group 2 x2 y2 x2 1.36 0.56 y2 0.56 1.36 3. The next step after creating the covariance matrices for group 1 and group 2 is to calculate the pooled covariance matrix Pooled Covariance Matrix x y x 3.05 0.27 y 0.27 1.93 4. Finally to calculate the Mahalanobis distance by taking the square root of multiplication of the difference between the means of G1 and G2 by the inverse of pooled covariance matrix.
  • 22. 22 Inverse Pooled Covariance matrix INVERS � 3.05 0.27 0.27 1.93 � = 1 (3.05∗1.93)−(0.27∗0.27) x � 1.93 −0.27 −0.27 3.05 � = � 0.332 −0.047 −0.047 0.526 � x Y x 0.332 -0.047 y -0.047 0.526 Mean difference (G1- G2) -2.3 𝑿𝑿𝑿𝑿���� − 𝑿𝑿𝑿𝑿���� = 3.9 - 6.2 = -2.3 -0.9 𝒚𝒚𝒚𝒚���� − 𝒚𝒚𝒚𝒚���� = 4.3 - 5.2 = -0.9 Mahalanobis distance = 1.41 � −2.3 −0.9 � x (−2.3 −0.9) x � 0.332 −0.047 −0.047 0.526 � = 1.41
  • 23. 23 References: Data mining // The Textbook by: Charu C. Aggarwal IBM T.J. Watson Research Center Yorktown Heights New York USA • https://www.autonlab.org/tutorials/mbl.html • https://en.wikipedia.org/wiki/Logistic_regression • http://people.revoledu.com/kardi/tutorial/Similarity/MahalanobisDistance.h tml • https://www.mathsisfun.com/algebra/matrix-multiplying.html