SlideShare a Scribd company logo
Introduction to Pattern Analysis
Ricardo Gutierrez-Osuna
Texas A&M University
1
LECTURE 8: Nearest Neighbors
g Nearest Neighbors density estimation
g The k Nearest Neighbors classification rule
g kNN as a lazy learner
g Characteristics of the kNN classifier
g Optimizing the kNN classifier
Introduction to Pattern Analysis
Ricardo Gutierrez-Osuna
Texas A&M University
2
Non-parametric density estimation: review
g Recall from the previous lecture that the general expression for non-
parametric density estimation is
g At that time, we mentioned that this estimate could be computed by
n Fixing the volume V and determining the number k of data points inside V
g This is the approach used in Kernel Density Estimation
n Fixing the value of k and determining the minimum volume V that encompasses
k points in the dataset
g This gives rise to the k Nearest Neighbor (kNN) approach, which is the subject of this
lecture





≅
V
inside
examples
of
number
the
is
k
examples
of
number
total
the
is
N
x
g
surroundin
volume
the
is
V
where
NV
k
p(x)
Introduction to Pattern Analysis
Ricardo Gutierrez-Osuna
Texas A&M University
3
kNN Density Estimation (1)
g In the kNN method we grow the volume surrounding the estimation
point x until it encloses a total of k data points
g The density estimate then becomes
n Rk(x) is the distance between the estimation point x and its k-th closest neighbor
n cD is the volume of the unit sphere in D dimensions, which is equal to
g Thus c1=2, c2=π, c3=4π/3 and so on
(x)
R
c
N
k
NV
k
P(x) D
k
D ⋅
⋅
=
≅
( ) ( )
1
D/2
!
D/2
c
D/2
D/2
D
+
=
=
Γ
π
π
R
Vol=πR2
x
2
R
N
k
P(x)
π
=
Introduction to Pattern Analysis
Ricardo Gutierrez-Osuna
Texas A&M University
4
kNN Density Estimation (1)
g In general, the estimates that can be obtained with the kNN
method are not very satisfactory
n The estimates are prone to local noise
n The method produces estimates with very heavy tails
n Since the function Rk(x) is not differentiable, the density estimate will
have discontinuities
n The resulting density is not a true probability density since its integral
over all the sample space diverges
g These properties are illustrated in the next few slides
Introduction to Pattern Analysis
Ricardo Gutierrez-Osuna
Texas A&M University
5
kNN Density Estimation, example 1
g To illustrate the behavior of kNN we generated several density estimates for a univariate
mixture of two Gaussians: P(x)=½N(0,1)+½N(10,4) and several values of N and k
Introduction to Pattern Analysis
Ricardo Gutierrez-Osuna
Texas A&M University
6
kNN Density Estimation, example 2 (a)
g The performance of the kNN
density estimation technique on
two dimensions is illustrated in
these figures
n The top figure shows the true density,
a mixture of two bivariate Gaussians
n The bottom figure shows the density
estimate for k=10 neighbors and
N=200 examples
n In the next slide we show the
contours of the two distributions
overlapped with the training data
used to generate the estimate
( ) ( )
[ ]
[ ]













−
−
=
=






=
=
+
=
4
1
1
1
Σ
0
5
µ
2
1
1
1
Σ
5
0
µ
with
Σ
,
µ
N
2
1
Σ
,
µ
N
2
1
p(x)
2
T
2
1
T
1
2
2
1
1
Introduction to Pattern Analysis
Ricardo Gutierrez-Osuna
Texas A&M University
7
kNN Density Estimation, example 2 (b)
True density contours kNN density estimate contours
Introduction to Pattern Analysis
Ricardo Gutierrez-Osuna
Texas A&M University
8
kNN Density Estimation as a Bayesian classifier
g The main advantage of the kNN method is that it leads to a very simple
approximation of the (optimal) Bayes classifier
n Assume that we have a dataset with N examples, Ni from class ωi, and that we
are interested in classifying an unknown sample xu
g We draw a hyper-sphere of volume V around xu. Assume this volume contains a total
of k examples, ki from class ωi.
n We can then approximate the likelihood functions using the kNN method by:
n Similarly, the unconditional density is estimated by
n And the priors are approximated by
n Putting everything together, the Bayes classifier becomes
V
N
k
)
|ω
P(x
i
i
i =
NV
k
P(x) =
N
N
)
P( i
i =
ω
k
k
NV
k
N
N
V
N
k
P(x)
)
)P(ω
ω
|
P(x
x)
|
P(ω i
i
i
i
i
i
i =
⋅
=
=
From [Bishop, 1995]
Introduction to Pattern Analysis
Ricardo Gutierrez-Osuna
Texas A&M University
9
The k Nearest Neighbor classification rule
g The K Nearest Neighbor Rule (kNN) is a very intuitive method that
classifies unlabeled examples based on their similarity to examples in
the training set
n For a given unlabeled example xu∈ℜD, find the k “closest” labeled examples
in the training data set and assign xu to the class that appears most
frequently within the k-subset
g The kNN only requires
n An integer k
n A set of labeled examples (training data)
n A metric to measure “closeness”
g Example
n In the example below we have three classes
and the goal is to find a class label for the
unknown example xu
n In this case we use the Euclidean distance
and a value of k=5 neighbors
n Of the 5 closest neighbors, 4 belong to ω1 and
1 belongs to ω3, so xu is assigned to ω1, the
predominant class
xu
ω3
ω1 ω2
Introduction to Pattern Analysis
Ricardo Gutierrez-Osuna
Texas A&M University
10
kNN in action: example 1
g We have generated data for a 2-dimensional 3-
class problem, where the class-conditional
densities are multi-modal, and non-linearly
separable, as illustrated in the figure
g We used the kNN rule with
n k = 5
n The Euclidean distance as a metric
g The resulting decision boundaries and decision
regions are shown below
Introduction to Pattern Analysis
Ricardo Gutierrez-Osuna
Texas A&M University
11
kNN in action: example 2
g We have generated data for a 2-dimensional 3-class
problem, where the class-conditional densities are
unimodal, and are distributed in rings around a
common mean. These classes are also non-linearly
separable, as illustrated in the figure
g We used the kNN rule with
n k = 5
n The Euclidean distance as a metric
g The resulting decision boundaries and decision
regions are shown below
Introduction to Pattern Analysis
Ricardo Gutierrez-Osuna
Texas A&M University
12
kNN as a lazy (machine learning) algorithm
g kNN is considered a lazy learning algorithm
n Defers data processing until it receives a request to classify an unlabelled
example
n Replies to a request for information by combining its stored training data
n Discards the constructed answer and any intermediate results
g Other names for lazy algorithms
n Memory-based, Instance-based , Exemplar-based , Case-based, Experience-
based
g This strategy is opposed to an eager learning algorithm which
n Compiles its data into a compressed description or model
g A density estimate or density parameters (statistical PR)
g A graph structure and associated weights (neural PR)
n Discards the training data after compilation of the model
n Classifies incoming patterns using the induced model, which is retained for future
requests
g Tradeoffs
n Lazy algorithms have fewer computational costs than eager algorithms during
training
n Lazy algorithms have greater storage requirements and higher computational
costs on recall
From [Aha, 1997]
Introduction to Pattern Analysis
Ricardo Gutierrez-Osuna
Texas A&M University
13
Characteristics of the kNN classifier
g Advantages
n Analytically tractable
n Simple implementation
n Nearly optimal in the large sample limit (N→∞)
g P[error]Bayes <P[error]1NN<2P[error]Bayes
n Uses local information, which can yield highly adaptive behavior
n Lends itself very easily to parallel implementations
g Disadvantages
n Large storage requirements
n Computationally intensive recall
n Highly susceptible to the curse of dimensionality
g 1NN versus kNN
n The use of large values of k has two main advantages
g Yields smoother decision regions
g Provides probabilistic information
n The ratio of examples for each class gives information about the ambiguity of the decision
n However, too large a value of k is detrimental
g It destroys the locality of the estimation since farther examples are taken into account
g In addition, it increases the computational burden
Introduction to Pattern Analysis
Ricardo Gutierrez-Osuna
Texas A&M University
14
kNN versus 1NN
1-NN 5-NN 20-NN
Introduction to Pattern Analysis
Ricardo Gutierrez-Osuna
Texas A&M University
15
Optimizing storage requirements
g The basic kNN algorithm stores all the examples in the training set,
creating high storage requirements (and computational cost)
n However, the entire training set need not be stored since the examples may
contain information that is highly redundant
g A degenerate case is the earlier example with the multimodal classes. In that example,
each of the clusters could be replaced by its mean vector, and the decision boundaries
would be practically identical
n In addition, almost all of the information that is relevant for classification
purposes is located around the decision boundaries
g A number of methods, called edited kNN, have been derived to take
advantage of this information redundancy
n One alternative [Wilson 72] is to classify all the examples in the training set and
remove those examples that are misclassified, in an attempt to separate
classification regions by removing ambiguous points
n The opposite alternative [Ritter 75], is to remove training examples that are
classified correctly, in an attempt to define the boundaries between classes by
eliminating points in the interior of the regions
g A different alternative is to reduce the training examples to a set of
prototypes that are representative of the underlying data
n The issue of selecting prototypes will be the subject of the lectures on clustering
Introduction to Pattern Analysis
Ricardo Gutierrez-Osuna
Texas A&M University
16
kNN and the problem of feature weighting
g The basic kNN rule computes the similarity measure based on the Euclidean distance
n This metric makes the kNN rule very sensitive to noisy features
n As an example, we have created a data set with three classes and two dimensions
g The first axis contains all the discriminatory information. In fact, class separability is excellent
g The second axis is white noise and, thus, does not contain classification information
g In the first example, both
axes are scaled properly
n The kNN (k=5) finds
decision boundaries fairly
close to the optimal
g In the second example,
the magnitude of the
second axis has been
increased two order of
magnitudes (see axes
tick marks)
n The kNN is biased by the
large values of the
second axis and its
performance is very poor
Introduction to Pattern Analysis
Ricardo Gutierrez-Osuna
Texas A&M University
17
Feature weighting
g The previous example illustrated the Achilles’ heel of the kNN classifier: its
sensitivity to noisy axes
n A possible solution would be to normalize each feature to N(0,1)
n However, normalization does not resolve the curse of dimensionality. A close look at the
Euclidean distance shows that this metric can become very noisy for high dimensional
problems if only a few of the features carry the classification information
g The solution to this problem is to modify the Euclidean metric by a set of
weights that represent the information content or “goodness” of each feature
n Note that this procedure is identical to performing a linear transformation where the
transformation matrix is diagonal with the weights placed in the diagonal elements
g From this perspective, feature weighting can be thought of as a special case of feature extraction
where the different features are not allowed to interact (null off-diagonal elements in the
transformation matrix)
g Feature subset selection, which will be covered later in the course, can be viewed as a special case
of feature weighting where the weights can only take binary [0,1] values
n Do not confuse feature-weighting with distance-weighting, a kNN variant that weights the
contribution of each of the k nearest neighbors according to their distance to the unlabeled
example
g Distance-weighting distorts the kNN estimate of P(ωi|x) and is NOT recommended
g Studies have shown that distance-weighting DOES NOT improve kNN classification performance
∑
=
−
=
D
1
k
2
u
u ))
k
(
x
)
k
(
x
(
)
x
,
x
(
d
( )
∑
=
−
⋅
=
D
1
k
2
u
u
w ])
k
[
x
]
k
[
x
(
]
k
[
w
)
x
,
x
(
d
Introduction to Pattern Analysis
Ricardo Gutierrez-Osuna
Texas A&M University
18
Feature weighting methods
g Feature weighting methods are divided in two groups
n Performance bias methods
n Preset bias methods
g Performance bias methods
n These methods find a set of weights through an iterative procedure that uses the
performance of the classifier as guidance to select a new set of weights
n These methods normally give good solutions since they can incorporate the classifier’s
feedback into the selection of weights
g Preset bias methods
n These methods obtain the values of the weights using a pre-determined function that
measures the information content of each feature (i.e., mutual information and correlation
between each feature and the class label)
n These methods have the advantage of executing very fast
g The issue of performance bias versus preset bias will be revisited when we
cover feature subset selection (FSS)
n In FSS the performance bias methods are called wrappers and preset bias methods are
called filters
Introduction to Pattern Analysis
Ricardo Gutierrez-Osuna
Texas A&M University
19
Improving the nearest neighbor search procedure
g The problem of nearest neighbor can be stated as follows
n Given a set of N points in D-dimensional space and an unlabeled example xu∈ℜD
, find the point that
minimizes the distance to xu
n The naïve approach of computing a set of N distances, and finding the (k) smallest becomes impractical for
large values of N and D
g There are two classical algorithms that speed up the nearest neighbor search
n Bucketing (a.k.a Elias’s algorithm) [Welch 1971]
n k-d trees [Bentley, 1975; Friedman et al, 1977]
g Bucketing
n In the Bucketing algorithm, the space is divided into identical cells and for
each cell the data points inside it are stored in a list
n The cells are examined in order of increasing distance from the query point
and for each cell the distance is computed between its internal data points
and the query point
n The search terminates when the distance from the query point to the cell
exceeds the distance to the closest point already visited
g k-d trees
n A k-d tree is a generalization of a binary search tree in high dimensions
g Each internal node in a k-d tree is associated with a hyper-rectangle and a hyper-plane orthogonal to one of the
coordinate axis
g The hyper-plane splits the hyper-rectangle into two parts, which are associated with the child nodes
g The partitioning process goes on until the number of data points in the hyper-rectangle falls below some given threshold
n The effect of a k-d tree is to partition the (multi-dimensional) sample space according to the underlying
distribution of the data, the partitioning being finer in regions where the density of data points is higher
g For a given query point, the algorithm works by first descending the the tree to find the data points lying in the cell that
contains the query point
g Then it examines surrounding cells if they overlap the ball centered at the query point and the closest data point so far
Introduction to Pattern Analysis
Ricardo Gutierrez-Osuna
Texas A&M University
20
k-d tree example
Data structure (3D case) Partitioning (2D case)

More Related Content

What's hot

Density Based Clustering
Density Based ClusteringDensity Based Clustering
Density Based Clustering
SSA KPI
 
2012 mdsp pr08 nonparametric approach
2012 mdsp pr08 nonparametric approach2012 mdsp pr08 nonparametric approach
2012 mdsp pr08 nonparametric approachnozomuhamada
 
2012 mdsp pr04 monte carlo
2012 mdsp pr04 monte carlo2012 mdsp pr04 monte carlo
2012 mdsp pr04 monte carlonozomuhamada
 
A04410107
A04410107A04410107
A04410107
IOSR-JEN
 
Instance Based Learning in Machine Learning
Instance Based Learning in Machine LearningInstance Based Learning in Machine Learning
Instance Based Learning in Machine Learning
Pavithra Thippanaik
 
Db Scan
Db ScanDb Scan
D. Mladenov - On Integrable Systems in Cosmology
D. Mladenov - On Integrable Systems in CosmologyD. Mladenov - On Integrable Systems in Cosmology
D. Mladenov - On Integrable Systems in Cosmology
SEENET-MTP
 
Dr. Pablo Diaz Benito (University of the Witwatersrand) TITLE: "Novel Charges...
Dr. Pablo Diaz Benito (University of the Witwatersrand) TITLE: "Novel Charges...Dr. Pablo Diaz Benito (University of the Witwatersrand) TITLE: "Novel Charges...
Dr. Pablo Diaz Benito (University of the Witwatersrand) TITLE: "Novel Charges...
Rene Kotze
 
DBSCAN (2014_11_25 06_21_12 UTC)
DBSCAN (2014_11_25 06_21_12 UTC)DBSCAN (2014_11_25 06_21_12 UTC)
DBSCAN (2014_11_25 06_21_12 UTC)Cory Cook
 
presentation
presentationpresentation
presentationjie ren
 
Quickselect Under Yaroslavskiy's Dual Pivoting Algorithm
Quickselect Under Yaroslavskiy's Dual Pivoting AlgorithmQuickselect Under Yaroslavskiy's Dual Pivoting Algorithm
Quickselect Under Yaroslavskiy's Dual Pivoting Algorithm
Sebastian Wild
 
Yandex wg-talk
Yandex wg-talkYandex wg-talk
Yandex wg-talk
Yandex
 
cis98010
cis98010cis98010
cis98010perfj
 
Graphical methods for 2 d heat transfer
Graphical methods for 2 d heat transfer Graphical methods for 2 d heat transfer
Graphical methods for 2 d heat transfer
Arun Sarasan
 
Recurrent and Recursive Networks (Part 1)
Recurrent and Recursive Networks (Part 1)Recurrent and Recursive Networks (Part 1)
Recurrent and Recursive Networks (Part 1)
sohaib_alam
 
Density based clustering
Density based clusteringDensity based clustering
Density based clustering
YaswanthHariKumarVud
 

What's hot (20)

Density Based Clustering
Density Based ClusteringDensity Based Clustering
Density Based Clustering
 
ECE611 Mini Project2
ECE611 Mini Project2ECE611 Mini Project2
ECE611 Mini Project2
 
2012 mdsp pr08 nonparametric approach
2012 mdsp pr08 nonparametric approach2012 mdsp pr08 nonparametric approach
2012 mdsp pr08 nonparametric approach
 
2012 mdsp pr04 monte carlo
2012 mdsp pr04 monte carlo2012 mdsp pr04 monte carlo
2012 mdsp pr04 monte carlo
 
A04410107
A04410107A04410107
A04410107
 
Instance Based Learning in Machine Learning
Instance Based Learning in Machine LearningInstance Based Learning in Machine Learning
Instance Based Learning in Machine Learning
 
Db Scan
Db ScanDb Scan
Db Scan
 
D. Mladenov - On Integrable Systems in Cosmology
D. Mladenov - On Integrable Systems in CosmologyD. Mladenov - On Integrable Systems in Cosmology
D. Mladenov - On Integrable Systems in Cosmology
 
Dr. Pablo Diaz Benito (University of the Witwatersrand) TITLE: "Novel Charges...
Dr. Pablo Diaz Benito (University of the Witwatersrand) TITLE: "Novel Charges...Dr. Pablo Diaz Benito (University of the Witwatersrand) TITLE: "Novel Charges...
Dr. Pablo Diaz Benito (University of the Witwatersrand) TITLE: "Novel Charges...
 
DBSCAN (2014_11_25 06_21_12 UTC)
DBSCAN (2014_11_25 06_21_12 UTC)DBSCAN (2014_11_25 06_21_12 UTC)
DBSCAN (2014_11_25 06_21_12 UTC)
 
ECCV WS 2012 (Frank)
ECCV WS 2012 (Frank)ECCV WS 2012 (Frank)
ECCV WS 2012 (Frank)
 
presentation
presentationpresentation
presentation
 
Bn
BnBn
Bn
 
Quickselect Under Yaroslavskiy's Dual Pivoting Algorithm
Quickselect Under Yaroslavskiy's Dual Pivoting AlgorithmQuickselect Under Yaroslavskiy's Dual Pivoting Algorithm
Quickselect Under Yaroslavskiy's Dual Pivoting Algorithm
 
Yandex wg-talk
Yandex wg-talkYandex wg-talk
Yandex wg-talk
 
cis98010
cis98010cis98010
cis98010
 
Graphical methods for 2 d heat transfer
Graphical methods for 2 d heat transfer Graphical methods for 2 d heat transfer
Graphical methods for 2 d heat transfer
 
Recurrent and Recursive Networks (Part 1)
Recurrent and Recursive Networks (Part 1)Recurrent and Recursive Networks (Part 1)
Recurrent and Recursive Networks (Part 1)
 
RebeccaSimmsYTF2016
RebeccaSimmsYTF2016RebeccaSimmsYTF2016
RebeccaSimmsYTF2016
 
Density based clustering
Density based clusteringDensity based clustering
Density based clustering
 

Similar to Lecture 8

Kernel estimation(ref)
Kernel estimation(ref)Kernel estimation(ref)
Kernel estimation(ref)
Zahra Amini
 
Instance based learning
Instance based learningInstance based learning
Instance based learning
swapnac12
 
K-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierK-Nearest Neighbor Classifier
K-Nearest Neighbor Classifier
Neha Kulkarni
 
Data analysis of weather forecasting
Data analysis of weather forecastingData analysis of weather forecasting
Data analysis of weather forecasting
Trupti Shingala, WAS, CPACC, CPWA, JAWS, CSM
 
Knn 160904075605-converted
Knn 160904075605-convertedKnn 160904075605-converted
Knn 160904075605-converted
rameswara reddy venkat
 
K- Nearest Neighbor Approach
K- Nearest Neighbor ApproachK- Nearest Neighbor Approach
K- Nearest Neighbor Approach
Kumud Arora
 
Investigating the Performance of Distanced-Based Weighted-Voting approaches i...
Investigating the Performance of Distanced-Based Weighted-Voting approaches i...Investigating the Performance of Distanced-Based Weighted-Voting approaches i...
Investigating the Performance of Distanced-Based Weighted-Voting approaches i...Dario Panada
 
Instance Learning and Genetic Algorithm by Dr.C.R.Dhivyaa Kongu Engineering C...
Instance Learning and Genetic Algorithm by Dr.C.R.Dhivyaa Kongu Engineering C...Instance Learning and Genetic Algorithm by Dr.C.R.Dhivyaa Kongu Engineering C...
Instance Learning and Genetic Algorithm by Dr.C.R.Dhivyaa Kongu Engineering C...
Dhivyaa C.R
 
UNIT IV (4).pptx
UNIT IV (4).pptxUNIT IV (4).pptx
UNIT IV (4).pptx
DrDhivyaaCRAssistant
 
3.2 partitioning methods
3.2 partitioning methods3.2 partitioning methods
3.2 partitioning methods
Krish_ver2
 
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...
ssuser2624f71
 
Rinfret, Jonathan poster(2)
Rinfret, Jonathan poster(2)Rinfret, Jonathan poster(2)
Rinfret, Jonathan poster(2)Jonathan Rinfret
 
Chapter 10. Cluster Analysis Basic Concepts and Methods.ppt
Chapter 10. Cluster Analysis Basic Concepts and Methods.pptChapter 10. Cluster Analysis Basic Concepts and Methods.ppt
Chapter 10. Cluster Analysis Basic Concepts and Methods.ppt
Subrata Kumer Paul
 
Knn Algorithm presentation
Knn Algorithm presentationKnn Algorithm presentation
Knn Algorithm presentation
RishavSharma112
 
Di35605610
Di35605610Di35605610
Di35605610
IJERA Editor
 
MrKNN_Soft Relevance for Multi-label Classification
MrKNN_Soft Relevance for Multi-label ClassificationMrKNN_Soft Relevance for Multi-label Classification
MrKNN_Soft Relevance for Multi-label Classification
YI-JHEN LIN
 
K - Nearest neighbor ( KNN )
K - Nearest neighbor  ( KNN )K - Nearest neighbor  ( KNN )
K - Nearest neighbor ( KNN )
Mohammad Junaid Khan
 
KNN.pptx
KNN.pptxKNN.pptx
KNN.pptx
dfgd7
 

Similar to Lecture 8 (20)

Kernel estimation(ref)
Kernel estimation(ref)Kernel estimation(ref)
Kernel estimation(ref)
 
Instance based learning
Instance based learningInstance based learning
Instance based learning
 
K-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierK-Nearest Neighbor Classifier
K-Nearest Neighbor Classifier
 
KNN.pptx
KNN.pptxKNN.pptx
KNN.pptx
 
Data analysis of weather forecasting
Data analysis of weather forecastingData analysis of weather forecasting
Data analysis of weather forecasting
 
Knn 160904075605-converted
Knn 160904075605-convertedKnn 160904075605-converted
Knn 160904075605-converted
 
K- Nearest Neighbor Approach
K- Nearest Neighbor ApproachK- Nearest Neighbor Approach
K- Nearest Neighbor Approach
 
Investigating the Performance of Distanced-Based Weighted-Voting approaches i...
Investigating the Performance of Distanced-Based Weighted-Voting approaches i...Investigating the Performance of Distanced-Based Weighted-Voting approaches i...
Investigating the Performance of Distanced-Based Weighted-Voting approaches i...
 
Instance Learning and Genetic Algorithm by Dr.C.R.Dhivyaa Kongu Engineering C...
Instance Learning and Genetic Algorithm by Dr.C.R.Dhivyaa Kongu Engineering C...Instance Learning and Genetic Algorithm by Dr.C.R.Dhivyaa Kongu Engineering C...
Instance Learning and Genetic Algorithm by Dr.C.R.Dhivyaa Kongu Engineering C...
 
UNIT IV (4).pptx
UNIT IV (4).pptxUNIT IV (4).pptx
UNIT IV (4).pptx
 
3.2 partitioning methods
3.2 partitioning methods3.2 partitioning methods
3.2 partitioning methods
 
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...
 
Rinfret, Jonathan poster(2)
Rinfret, Jonathan poster(2)Rinfret, Jonathan poster(2)
Rinfret, Jonathan poster(2)
 
Chapter 10. Cluster Analysis Basic Concepts and Methods.ppt
Chapter 10. Cluster Analysis Basic Concepts and Methods.pptChapter 10. Cluster Analysis Basic Concepts and Methods.ppt
Chapter 10. Cluster Analysis Basic Concepts and Methods.ppt
 
Knn Algorithm presentation
Knn Algorithm presentationKnn Algorithm presentation
Knn Algorithm presentation
 
Di35605610
Di35605610Di35605610
Di35605610
 
MrKNN_Soft Relevance for Multi-label Classification
MrKNN_Soft Relevance for Multi-label ClassificationMrKNN_Soft Relevance for Multi-label Classification
MrKNN_Soft Relevance for Multi-label Classification
 
K - Nearest neighbor ( KNN )
K - Nearest neighbor  ( KNN )K - Nearest neighbor  ( KNN )
K - Nearest neighbor ( KNN )
 
KNN.pptx
KNN.pptxKNN.pptx
KNN.pptx
 
KNN.pptx
KNN.pptxKNN.pptx
KNN.pptx
 

More from Zahra Amini

Dip azimifar enhancement_l05_2020
Dip azimifar enhancement_l05_2020Dip azimifar enhancement_l05_2020
Dip azimifar enhancement_l05_2020
Zahra Amini
 
Dip azimifar enhancement_l04_2020
Dip azimifar enhancement_l04_2020Dip azimifar enhancement_l04_2020
Dip azimifar enhancement_l04_2020
Zahra Amini
 
Dip azimifar enhancement_l03_2020
Dip azimifar enhancement_l03_2020Dip azimifar enhancement_l03_2020
Dip azimifar enhancement_l03_2020
Zahra Amini
 
Dip azimifar enhancement_l02_2020
Dip azimifar enhancement_l02_2020Dip azimifar enhancement_l02_2020
Dip azimifar enhancement_l02_2020
Zahra Amini
 
Dip azimifar enhancement_l01_2020
Dip azimifar enhancement_l01_2020Dip azimifar enhancement_l01_2020
Dip azimifar enhancement_l01_2020
Zahra Amini
 
Ch 1-3 nn learning 1-7
Ch 1-3 nn learning 1-7Ch 1-3 nn learning 1-7
Ch 1-3 nn learning 1-7
Zahra Amini
 
Ch 1-2 NN classifier
Ch 1-2 NN classifierCh 1-2 NN classifier
Ch 1-2 NN classifier
Zahra Amini
 
Ch 1-1 introduction
Ch 1-1 introductionCh 1-1 introduction
Ch 1-1 introduction
Zahra Amini
 
Dr azimifar pattern recognition lect4
Dr azimifar pattern recognition lect4Dr azimifar pattern recognition lect4
Dr azimifar pattern recognition lect4
Zahra Amini
 
Dr azimifar pattern recognition lect2
Dr azimifar pattern recognition lect2Dr azimifar pattern recognition lect2
Dr azimifar pattern recognition lect2
Zahra Amini
 
Dr azimifar pattern recognition lect1
Dr azimifar pattern recognition lect1Dr azimifar pattern recognition lect1
Dr azimifar pattern recognition lect1
Zahra Amini
 

More from Zahra Amini (11)

Dip azimifar enhancement_l05_2020
Dip azimifar enhancement_l05_2020Dip azimifar enhancement_l05_2020
Dip azimifar enhancement_l05_2020
 
Dip azimifar enhancement_l04_2020
Dip azimifar enhancement_l04_2020Dip azimifar enhancement_l04_2020
Dip azimifar enhancement_l04_2020
 
Dip azimifar enhancement_l03_2020
Dip azimifar enhancement_l03_2020Dip azimifar enhancement_l03_2020
Dip azimifar enhancement_l03_2020
 
Dip azimifar enhancement_l02_2020
Dip azimifar enhancement_l02_2020Dip azimifar enhancement_l02_2020
Dip azimifar enhancement_l02_2020
 
Dip azimifar enhancement_l01_2020
Dip azimifar enhancement_l01_2020Dip azimifar enhancement_l01_2020
Dip azimifar enhancement_l01_2020
 
Ch 1-3 nn learning 1-7
Ch 1-3 nn learning 1-7Ch 1-3 nn learning 1-7
Ch 1-3 nn learning 1-7
 
Ch 1-2 NN classifier
Ch 1-2 NN classifierCh 1-2 NN classifier
Ch 1-2 NN classifier
 
Ch 1-1 introduction
Ch 1-1 introductionCh 1-1 introduction
Ch 1-1 introduction
 
Dr azimifar pattern recognition lect4
Dr azimifar pattern recognition lect4Dr azimifar pattern recognition lect4
Dr azimifar pattern recognition lect4
 
Dr azimifar pattern recognition lect2
Dr azimifar pattern recognition lect2Dr azimifar pattern recognition lect2
Dr azimifar pattern recognition lect2
 
Dr azimifar pattern recognition lect1
Dr azimifar pattern recognition lect1Dr azimifar pattern recognition lect1
Dr azimifar pattern recognition lect1
 

Recently uploaded

RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
thanhdowork
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
Osamah Alsalih
 
Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
Aditya Rajan Patra
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
obonagu
 
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
karthi keyan
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
Amil Baba Dawood bangali
 
ML for identifying fraud using open blockchain data.pptx
ML for identifying fraud using open blockchain data.pptxML for identifying fraud using open blockchain data.pptx
ML for identifying fraud using open blockchain data.pptx
Vijay Dialani, PhD
 
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
VENKATESHvenky89705
 
Railway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdfRailway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdf
TeeVichai
 
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
MdTanvirMahtab2
 
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Dr.Costas Sachpazis
 
Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
Kamal Acharya
 
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTSHeap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Soumen Santra
 
Hierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power SystemHierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power System
Kerry Sado
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Sreedhar Chowdam
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
Kamal Acharya
 
English lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdfEnglish lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdf
BrazilAccount1
 
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdfTutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
aqil azizi
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
JoytuBarua2
 
space technology lecture notes on satellite
space technology lecture notes on satellitespace technology lecture notes on satellite
space technology lecture notes on satellite
ongomchris
 

Recently uploaded (20)

RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
 
Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
 
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
 
ML for identifying fraud using open blockchain data.pptx
ML for identifying fraud using open blockchain data.pptxML for identifying fraud using open blockchain data.pptx
ML for identifying fraud using open blockchain data.pptx
 
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
 
Railway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdfRailway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdf
 
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
 
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
 
Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
 
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTSHeap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
 
Hierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power SystemHierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power System
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
 
English lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdfEnglish lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdf
 
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdfTutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
 
space technology lecture notes on satellite
space technology lecture notes on satellitespace technology lecture notes on satellite
space technology lecture notes on satellite
 

Lecture 8

  • 1. Introduction to Pattern Analysis Ricardo Gutierrez-Osuna Texas A&M University 1 LECTURE 8: Nearest Neighbors g Nearest Neighbors density estimation g The k Nearest Neighbors classification rule g kNN as a lazy learner g Characteristics of the kNN classifier g Optimizing the kNN classifier
  • 2. Introduction to Pattern Analysis Ricardo Gutierrez-Osuna Texas A&M University 2 Non-parametric density estimation: review g Recall from the previous lecture that the general expression for non- parametric density estimation is g At that time, we mentioned that this estimate could be computed by n Fixing the volume V and determining the number k of data points inside V g This is the approach used in Kernel Density Estimation n Fixing the value of k and determining the minimum volume V that encompasses k points in the dataset g This gives rise to the k Nearest Neighbor (kNN) approach, which is the subject of this lecture      ≅ V inside examples of number the is k examples of number total the is N x g surroundin volume the is V where NV k p(x)
  • 3. Introduction to Pattern Analysis Ricardo Gutierrez-Osuna Texas A&M University 3 kNN Density Estimation (1) g In the kNN method we grow the volume surrounding the estimation point x until it encloses a total of k data points g The density estimate then becomes n Rk(x) is the distance between the estimation point x and its k-th closest neighbor n cD is the volume of the unit sphere in D dimensions, which is equal to g Thus c1=2, c2=π, c3=4π/3 and so on (x) R c N k NV k P(x) D k D ⋅ ⋅ = ≅ ( ) ( ) 1 D/2 ! D/2 c D/2 D/2 D + = = Γ π π R Vol=πR2 x 2 R N k P(x) π =
  • 4. Introduction to Pattern Analysis Ricardo Gutierrez-Osuna Texas A&M University 4 kNN Density Estimation (1) g In general, the estimates that can be obtained with the kNN method are not very satisfactory n The estimates are prone to local noise n The method produces estimates with very heavy tails n Since the function Rk(x) is not differentiable, the density estimate will have discontinuities n The resulting density is not a true probability density since its integral over all the sample space diverges g These properties are illustrated in the next few slides
  • 5. Introduction to Pattern Analysis Ricardo Gutierrez-Osuna Texas A&M University 5 kNN Density Estimation, example 1 g To illustrate the behavior of kNN we generated several density estimates for a univariate mixture of two Gaussians: P(x)=½N(0,1)+½N(10,4) and several values of N and k
  • 6. Introduction to Pattern Analysis Ricardo Gutierrez-Osuna Texas A&M University 6 kNN Density Estimation, example 2 (a) g The performance of the kNN density estimation technique on two dimensions is illustrated in these figures n The top figure shows the true density, a mixture of two bivariate Gaussians n The bottom figure shows the density estimate for k=10 neighbors and N=200 examples n In the next slide we show the contours of the two distributions overlapped with the training data used to generate the estimate ( ) ( ) [ ] [ ]              − − = =       = = + = 4 1 1 1 Σ 0 5 µ 2 1 1 1 Σ 5 0 µ with Σ , µ N 2 1 Σ , µ N 2 1 p(x) 2 T 2 1 T 1 2 2 1 1
  • 7. Introduction to Pattern Analysis Ricardo Gutierrez-Osuna Texas A&M University 7 kNN Density Estimation, example 2 (b) True density contours kNN density estimate contours
  • 8. Introduction to Pattern Analysis Ricardo Gutierrez-Osuna Texas A&M University 8 kNN Density Estimation as a Bayesian classifier g The main advantage of the kNN method is that it leads to a very simple approximation of the (optimal) Bayes classifier n Assume that we have a dataset with N examples, Ni from class ωi, and that we are interested in classifying an unknown sample xu g We draw a hyper-sphere of volume V around xu. Assume this volume contains a total of k examples, ki from class ωi. n We can then approximate the likelihood functions using the kNN method by: n Similarly, the unconditional density is estimated by n And the priors are approximated by n Putting everything together, the Bayes classifier becomes V N k ) |ω P(x i i i = NV k P(x) = N N ) P( i i = ω k k NV k N N V N k P(x) ) )P(ω ω | P(x x) | P(ω i i i i i i i = ⋅ = = From [Bishop, 1995]
  • 9. Introduction to Pattern Analysis Ricardo Gutierrez-Osuna Texas A&M University 9 The k Nearest Neighbor classification rule g The K Nearest Neighbor Rule (kNN) is a very intuitive method that classifies unlabeled examples based on their similarity to examples in the training set n For a given unlabeled example xu∈ℜD, find the k “closest” labeled examples in the training data set and assign xu to the class that appears most frequently within the k-subset g The kNN only requires n An integer k n A set of labeled examples (training data) n A metric to measure “closeness” g Example n In the example below we have three classes and the goal is to find a class label for the unknown example xu n In this case we use the Euclidean distance and a value of k=5 neighbors n Of the 5 closest neighbors, 4 belong to ω1 and 1 belongs to ω3, so xu is assigned to ω1, the predominant class xu ω3 ω1 ω2
  • 10. Introduction to Pattern Analysis Ricardo Gutierrez-Osuna Texas A&M University 10 kNN in action: example 1 g We have generated data for a 2-dimensional 3- class problem, where the class-conditional densities are multi-modal, and non-linearly separable, as illustrated in the figure g We used the kNN rule with n k = 5 n The Euclidean distance as a metric g The resulting decision boundaries and decision regions are shown below
  • 11. Introduction to Pattern Analysis Ricardo Gutierrez-Osuna Texas A&M University 11 kNN in action: example 2 g We have generated data for a 2-dimensional 3-class problem, where the class-conditional densities are unimodal, and are distributed in rings around a common mean. These classes are also non-linearly separable, as illustrated in the figure g We used the kNN rule with n k = 5 n The Euclidean distance as a metric g The resulting decision boundaries and decision regions are shown below
  • 12. Introduction to Pattern Analysis Ricardo Gutierrez-Osuna Texas A&M University 12 kNN as a lazy (machine learning) algorithm g kNN is considered a lazy learning algorithm n Defers data processing until it receives a request to classify an unlabelled example n Replies to a request for information by combining its stored training data n Discards the constructed answer and any intermediate results g Other names for lazy algorithms n Memory-based, Instance-based , Exemplar-based , Case-based, Experience- based g This strategy is opposed to an eager learning algorithm which n Compiles its data into a compressed description or model g A density estimate or density parameters (statistical PR) g A graph structure and associated weights (neural PR) n Discards the training data after compilation of the model n Classifies incoming patterns using the induced model, which is retained for future requests g Tradeoffs n Lazy algorithms have fewer computational costs than eager algorithms during training n Lazy algorithms have greater storage requirements and higher computational costs on recall From [Aha, 1997]
  • 13. Introduction to Pattern Analysis Ricardo Gutierrez-Osuna Texas A&M University 13 Characteristics of the kNN classifier g Advantages n Analytically tractable n Simple implementation n Nearly optimal in the large sample limit (N→∞) g P[error]Bayes <P[error]1NN<2P[error]Bayes n Uses local information, which can yield highly adaptive behavior n Lends itself very easily to parallel implementations g Disadvantages n Large storage requirements n Computationally intensive recall n Highly susceptible to the curse of dimensionality g 1NN versus kNN n The use of large values of k has two main advantages g Yields smoother decision regions g Provides probabilistic information n The ratio of examples for each class gives information about the ambiguity of the decision n However, too large a value of k is detrimental g It destroys the locality of the estimation since farther examples are taken into account g In addition, it increases the computational burden
  • 14. Introduction to Pattern Analysis Ricardo Gutierrez-Osuna Texas A&M University 14 kNN versus 1NN 1-NN 5-NN 20-NN
  • 15. Introduction to Pattern Analysis Ricardo Gutierrez-Osuna Texas A&M University 15 Optimizing storage requirements g The basic kNN algorithm stores all the examples in the training set, creating high storage requirements (and computational cost) n However, the entire training set need not be stored since the examples may contain information that is highly redundant g A degenerate case is the earlier example with the multimodal classes. In that example, each of the clusters could be replaced by its mean vector, and the decision boundaries would be practically identical n In addition, almost all of the information that is relevant for classification purposes is located around the decision boundaries g A number of methods, called edited kNN, have been derived to take advantage of this information redundancy n One alternative [Wilson 72] is to classify all the examples in the training set and remove those examples that are misclassified, in an attempt to separate classification regions by removing ambiguous points n The opposite alternative [Ritter 75], is to remove training examples that are classified correctly, in an attempt to define the boundaries between classes by eliminating points in the interior of the regions g A different alternative is to reduce the training examples to a set of prototypes that are representative of the underlying data n The issue of selecting prototypes will be the subject of the lectures on clustering
  • 16. Introduction to Pattern Analysis Ricardo Gutierrez-Osuna Texas A&M University 16 kNN and the problem of feature weighting g The basic kNN rule computes the similarity measure based on the Euclidean distance n This metric makes the kNN rule very sensitive to noisy features n As an example, we have created a data set with three classes and two dimensions g The first axis contains all the discriminatory information. In fact, class separability is excellent g The second axis is white noise and, thus, does not contain classification information g In the first example, both axes are scaled properly n The kNN (k=5) finds decision boundaries fairly close to the optimal g In the second example, the magnitude of the second axis has been increased two order of magnitudes (see axes tick marks) n The kNN is biased by the large values of the second axis and its performance is very poor
  • 17. Introduction to Pattern Analysis Ricardo Gutierrez-Osuna Texas A&M University 17 Feature weighting g The previous example illustrated the Achilles’ heel of the kNN classifier: its sensitivity to noisy axes n A possible solution would be to normalize each feature to N(0,1) n However, normalization does not resolve the curse of dimensionality. A close look at the Euclidean distance shows that this metric can become very noisy for high dimensional problems if only a few of the features carry the classification information g The solution to this problem is to modify the Euclidean metric by a set of weights that represent the information content or “goodness” of each feature n Note that this procedure is identical to performing a linear transformation where the transformation matrix is diagonal with the weights placed in the diagonal elements g From this perspective, feature weighting can be thought of as a special case of feature extraction where the different features are not allowed to interact (null off-diagonal elements in the transformation matrix) g Feature subset selection, which will be covered later in the course, can be viewed as a special case of feature weighting where the weights can only take binary [0,1] values n Do not confuse feature-weighting with distance-weighting, a kNN variant that weights the contribution of each of the k nearest neighbors according to their distance to the unlabeled example g Distance-weighting distorts the kNN estimate of P(ωi|x) and is NOT recommended g Studies have shown that distance-weighting DOES NOT improve kNN classification performance ∑ = − = D 1 k 2 u u )) k ( x ) k ( x ( ) x , x ( d ( ) ∑ = − ⋅ = D 1 k 2 u u w ]) k [ x ] k [ x ( ] k [ w ) x , x ( d
  • 18. Introduction to Pattern Analysis Ricardo Gutierrez-Osuna Texas A&M University 18 Feature weighting methods g Feature weighting methods are divided in two groups n Performance bias methods n Preset bias methods g Performance bias methods n These methods find a set of weights through an iterative procedure that uses the performance of the classifier as guidance to select a new set of weights n These methods normally give good solutions since they can incorporate the classifier’s feedback into the selection of weights g Preset bias methods n These methods obtain the values of the weights using a pre-determined function that measures the information content of each feature (i.e., mutual information and correlation between each feature and the class label) n These methods have the advantage of executing very fast g The issue of performance bias versus preset bias will be revisited when we cover feature subset selection (FSS) n In FSS the performance bias methods are called wrappers and preset bias methods are called filters
  • 19. Introduction to Pattern Analysis Ricardo Gutierrez-Osuna Texas A&M University 19 Improving the nearest neighbor search procedure g The problem of nearest neighbor can be stated as follows n Given a set of N points in D-dimensional space and an unlabeled example xu∈ℜD , find the point that minimizes the distance to xu n The naïve approach of computing a set of N distances, and finding the (k) smallest becomes impractical for large values of N and D g There are two classical algorithms that speed up the nearest neighbor search n Bucketing (a.k.a Elias’s algorithm) [Welch 1971] n k-d trees [Bentley, 1975; Friedman et al, 1977] g Bucketing n In the Bucketing algorithm, the space is divided into identical cells and for each cell the data points inside it are stored in a list n The cells are examined in order of increasing distance from the query point and for each cell the distance is computed between its internal data points and the query point n The search terminates when the distance from the query point to the cell exceeds the distance to the closest point already visited g k-d trees n A k-d tree is a generalization of a binary search tree in high dimensions g Each internal node in a k-d tree is associated with a hyper-rectangle and a hyper-plane orthogonal to one of the coordinate axis g The hyper-plane splits the hyper-rectangle into two parts, which are associated with the child nodes g The partitioning process goes on until the number of data points in the hyper-rectangle falls below some given threshold n The effect of a k-d tree is to partition the (multi-dimensional) sample space according to the underlying distribution of the data, the partitioning being finer in regions where the density of data points is higher g For a given query point, the algorithm works by first descending the the tree to find the data points lying in the cell that contains the query point g Then it examines surrounding cells if they overlap the ball centered at the query point and the closest data point so far
  • 20. Introduction to Pattern Analysis Ricardo Gutierrez-Osuna Texas A&M University 20 k-d tree example Data structure (3D case) Partitioning (2D case)