Clustering in artificial intelligence

1
UNSUPERVISED LEARNINGUNSUPERVISED LEARNING
Supervised and Unsupervised Learning
ID3 and Version space are supervised learning algorithms
Unsupervised learning eliminates the teacher and requires
that the learners form concepts (categories) on their own
Conceptual clustering is the problem of discovering useful
categories in unclassified data (data whose categories are not
pre-determined)

2
CONCEPTUAL CLUSTERINGCONCEPTUAL CLUSTERING
Unsupervised Learning and Numeric taxonomy
The clustering problem begins with a collection of
unclassified objects and a means for measuring the similarity
of objects
The goal is to organize the objects into classes so that similar
objects are in one class
Numeric taxonomy is one of the oldest approaches to
clustering problem

3
In Numeric Taxonomy the objects are represented as a
collection of features and each of the feature has some
numeric value
An object is thus a vector of n feature values and can be
considered as a point in n-dimensional space
The similarity of any two objects can be measured by the
Euclidean distance between them in this space
Using this similarity metric, clustering algorithms build
clusters in a bottom up fashion (agglomerative clustering
strategy)

4
The categories are formed by the following approach
1. Examine all pairs of objects, and select the pair with the
highest degree of similarity and make that pair a cluster
2. Define the features of the cluster as some function, such
as average, of the features of the component members
and then replace the component objects with this cluster
definition
3. Repeat this process on the collection of objects until all
objects have been reduced to a single cluster
The result of this algorithm is a binary tree whose leaf nodes
are instances and whose internal nodes are clusters of
increasing size

5

6
We may extend this algorithm to objects represented as sets
of symbolic features. The only problem is in the measuring
the similarity of objects
A similarity metric can be the proportion of features that
any two objects have in common
object 1 = {small, red, rubber, ball}
object 2 = {small, blue, rubber, ball}
object 3 = {large, black, wooden, ball}
similarity (object 1, object 2) = ¾
similarity (object 1, object 3) = ¼
similarity (object 2, object 3) = ¼

7
In defining categories we cannot give all features equal
weight
In any given context, certain of an object’s features are more
important than others; simple similarity metrics treat all
features equally
The feature weights are to be set according to the goals of the
categorization

8
CLUSTER/2CLUSTER/2
CLUSTER/2 forms k categories by constructing individuals
around k seed objects
The parameter k is user adjustable
CLUSTER/2 evaluates the resulting clusters, selecting new
seeds and repeating the process until its quality criteria are
met

9
CLUSTER/2CLUSTER/2
The algorithm
• Select k seeds from the set of observed objects. This may
be done randomly or according to some selection function
• For each seed, using that seed as a positive instance and
all other seeds as negative instances, produce a
maximally general definition that tries to cover all of the
non-seed instances, until stopped by the negative
instances (other seeds)

10
CLUSTER/2CLUSTER/2
The algorithm
• Classify all objects in the sample according to these
descriptions. Note that this may lead to multiple
classifications of other, non seed, objects

11
CLUSTER/2CLUSTER/2
The algorithm
• Replace each maximally general description with a
maximally specific description that covers all objects in
the category. This decreases likelihood that classes
overlap on unseen objects

12
CLUSTER/2CLUSTER/2
The algorithm
• Classes may still overlap on given objects.
• Using a distance metric, select an element closest to the
center of each class. Using these central elements as new
seeds, repeat the above steps.

13
CLUSTER/2CLUSTER/2
The algorithm
• Stop when clusters are satisfactory. A typical quality
matrix is the complexity of the general descriptions of
classes. For instance, we might prefer clusters that yield
syntactically simple definitions, such as those with a small
number of conjuncts

14
CLUSTER/2CLUSTER/2
The algorithm
• If clusters are unsatisfactory and no improvement occurs
over several iterations, select the new seeds closest to the
edge of the cluster, rather than those at the center. This
favors creation of totally new clusters

15
AssignmentAssignment
Read Section 9.6.1 & 9.6.2 of Luger

Clustering in artificial intelligence

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (18)

Similar to Clustering in artificial intelligence

Similar to Clustering in artificial intelligence (20)

Recently uploaded

Recently uploaded (20)

Clustering in artificial intelligence