Unsupervised Learning is a form of learning technique (basically machine learning) all the topics are covered from Artificial Intelligence: Structure and strategies for complex problem solving Fifth Edition by George F Lugar.
2. What we will be studying.
Automated Mathematician (A M)
Conceptual Clustering
COBWEB & Structure of Taxonomic Knowledge
3. So what is Unsupervised Learning and how is it
different from Supervised Learning.?
4. Automated Mathematician (A M)
● One of the earliest successful discovery systems.
● Created by Douglas Lenat in Lisp.
● Began with the concept of set theory, operations for creating new knowledge by
modifying and combining existing concepts, and a set of heuristics.
● Limitations
○ AM discovered prime numbers and several other interesting concepts, it
failed to progress beyond elementary number theory.
○ In ability to “learn to learn”, as it did not acquire new heuristics from new
discoveries in mathematics.
5. Clustering
● Is the task of grouping a set of objects in such a way that objects in the same
group (called a cluster) are more similar to each other than those in other groups
(clusters).
● Its main task is exploratory data mining, and a common technique for statistical
data analysis.
● Used in many fields, including machine learning, pattern recognition image
analysis.
6. Clustering problem begins with
● Begins with a collection of unclassified object and means for measuring the
similarity of objects.
● The goal is to organize the objects into classes that meet the standard (such as
maximizing the similarity of object in same class).
● Two Strategies - Numeric and Agglomerative.
7. cont.
Clustering algo builds clusters in bottom-up approach.
● Examining all pairs of objects, selecting the pair with the highest degree of
similarity, and making that pair a cluster.
● Defining the features of clusters as some func. (such as avg.) of the features
of the component members and then replacing the component objects with
this cluster definition.
● Repeat the process on all collection of objects until all objects have been
reduced to single cluster.
8. So the result will be a Binary tree whose leaf nodes are instances and internal
nodes are clusters of increasing size.
We may extend the algorithm as set of symbolic (using similarity of objects).
obj1={small,red,rubber,ball}
obj2={small,blue,rubber,ball}
obj3={large,black,wooden,ball}
sim(obj1,obj2)=3/5
sim(obj1,obj3)=sim(obj2,obj3)=1/7
9. Conceptual Clustering(CC)
CC addresses problem by using machine learning techniques to produce a general
concept definition and applying background knowledge.
CLUSTER/2 is the best example of CC approach.
10. CLUSTER/2
● Cluster/2 forms k categories by constructing individual around k seed objects.
● Cluster/2 evaluates the resulting clusters, selecting new seeds and repeating the
process until quality criteria is met. The algo is defined as
○ Select k seeds from the set of observed objects. (selection is done randomly
or by some selection function).
○ For each seed, using that seed as +ve instance and all other seed as -ve
instance, produce maximally general definition that covers all +ve and -ve
instances.(may lead to multiple classificatn of nonseed obj’s.)
○ Classify all obj’s in the sample according to those descriptions. Replace each
maximally general description with a maximally specific description that
covers all obj’s in the category. This decreases likelihood that classes overlap
on unseen obj’s
11. cont.
○ Classes may still overlap on given obj’s. CLUSTER/2 includes algo for
adjusting overlapping definitions.
○ Using a distance metric, select closest to center of each class (distance
metric could be somewhat similar to similarity metric).
○ Using these central elements as new seeds repeat steps 1-5 till a desired
quality is met.
○ If cluster are unsatisfactory and no improvement occurs over several iteratn’s
select new seed closest to the edge, rather than those at the center.
12. COBWEB & struct. Of taxonomy knowledge
● COBWEB is an incremental system for hierarchical conceptual clustering.
● There are four basic operations COBWEB employs in building the classification
tree.
○ Merging Two Nodes-Merging two nodes means replacing them by a node
whose children is the union of the original nodes' sets of children and which
summarizes the attribute value distributions of all objects classified under
them.
○ Splitting a node:- A node is split by replacing it with its children.
○ Inserting a new node:- A node is created corresponding to the object being
inserted into the tree.
○ Passing an object down the hierarchy:- Effectively calling the COBWEB
algorithm on the object and the subtree rooted in the nodes.
13. cont.
● COBWEB performs hill-climbing search of possible taxonomies.
● Initializes taxonomies to single category. For each subsequent instance, the algo
begins with root category and moves thru the tree. At each level it evaluates the
taxonomies resulting from
○ Placing the instance in the best existing category.
○ Adding a new category containing only instance.
○ Merging of two existing categories into one & adding the instance to that
category.
○ Splitting of an existing category into two & placing the instance in the best
new resulting category.