4. Distance based
• Place items in class to which they are “closest”.
• Distance measure is used to find alikeness of different items.
• Simple Approach:Classes represented by
– Centroid: Central value.
ALGORITHM
Input : c1 , ... , cm //Centers for each class
t // Input tuple to classify
Output : C //Class to which t is assigned
Simple distance-based algorithm
dist = ∞;
for i := 1 to m do
if dis(ci , t) < dist, then
6. K Nearest Neighbor (KNN):
• Common classification scheme based on the use of distance
measures is that of the K nearest neighbors (KNN).
• Training set includes classes along with item set.
• When a classification is to be made for a new item, its distance
to each item in the training set must be determined.
• Only the K closest entries in the training set are considered
further
• New item placed in class that contains the most items from this
set of K closest items.
• O(q) for each tuple to be classified. (Here q is the size of the
training set.)
• KNN technique is extremely sensitive to the value of K. A rule of
thumb is that K ≤
9. Classification Using Decision Trees
• Partitioning based: Divide search space into rectangular
regions.
• Tuple placed into class based on the region within which it falls.
• DT approaches differ in how the tree is built: DT Induction
• Internal nodes associated with attribute and arcs with values for
that attribute.
• Algorithms: ID3, C4.5, CART
10. Decision Tree
Given:
– D = {t1, …, tn} where ti=<ti1, …, tih>
– Database schema contains {A1, A2, …, Ah}
– Classes C={C1, …., Cm}
Decision or Classification Tree is a tree associated
with D such that
– Each internal node is labeled with attribute, Ai
– Each arc is labeled with predicate which can be
applied to attribute at parent
– Each leaf node is labeled with a class, Cj