Unsupervised Slides


Published on

Published in: Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Unsupervised Slides

  1. 1. 5 - Unsupervised Learning Introduction • Unsupervised Learning • Learner receives no explicit information about • Introduction classification of input examples. • Statistical Clustering • Information is implicit. • Aim of learning process - to discover regularities in the input • Conceptual Clustering data. • UNIMEM • Typically, consists of partitioning instances into classes (based on some similarity metric). • COBWEB • ie finding clusters of instances in the instance space. • Not surprising that unsupervised learning systems sometimes closely resemble statistical clustering systems. What is Clustering ? Simple Clustering Algorithm • Initialize • Common problem - construction of meaningful • Set D to be the set of singleton sets such that each classifications of observed objects or situations. set contains a unique set. • Often known as numerical taxonomy - since it • Until D contains only 1 element, do the following: involves production of a class hierarchy • Form a matrix of similarity values for all (classification scheme) using a mathematical elements of D measure of similarity over the instances. • Using some given similarity function • Merge those elements of D which have a maximum similarity value. • Often known as agglomerative clustering. • Works bottom-up - trying to build larger clusters. • Alternative - divisive clustering. • Works top-down (cf ID3)
  2. 2. Clustering Clustering • Traditional techniques • Consider this example: • Often inadequate - as they arrange objects into classes solely on the basis of a numerical measure of object similarity. • Only information used is that contained in the instances A B themselves. • Algorithms unable to take account of semantic relationships among instance attributes or global concepts that might be of relevance in forming a classification scheme. • Conceptual Clustering • WE would not cluster A and B together - but would • Idea first introduced by R S Michalski - 1980 cluster them into the 2 diamonds. • Defined as process of constructing a concept network • Partitioning using concept membership rather characterizing a collection of objects with nodes marked by than distance. concepts describing object classes & links marked by the • Points are placed in the same cluster if relationships between the classes. collectively they represent the same concept. • This is basis of conceptual clustering Conceptual Clustering Conceptual Clustering • Can be regarded as: name body-cover heart-chamber body-temp fertilisation mammal hair four regulated internal • Given: • Given animal bird feathers four regulated internal • A set of objects descriptors: reptile cornified-skin imperfect-four unregulated internal • A set of attributes to be used to characterise objects amphibian moist-skin three unregulated external fish scales two unregulated external • A body of background knowledge - includes problem constraints, properties of attributes, criteria for evaluating quality of constructed classifications. animals • Find: • Classification • A hierarchy of object classes hierarchy mammals/bird reptile amphibian/fish • Each node should form a coherent concept produced: • Compact • Easily represented in terms of a definition or rule that mammal bird amphibian fish has a natural interpretation for humans
  3. 3. Conceptual Clustering UNIMEM • Lebowitz - 1987 • Michalski - 1980 • Essentially a divisive clustering algorithm • Conjunctive conceptual clustering • Uses a decision tree structure as its basic representation. • Concept class consists of conjunctive statements involving relations on selected object attributes. • If asked to classify an instance - searches down through the • Method arranges objects into a hierarchy of classes. tree, testing attributes & returns a classification based on the • CLUSTER/2 relevant leaf nodes. • Used to construct classification hierarchy of a large collection of Spanish folk songs. • If asked to update the tree so as to represent a new instance - searches down through the tree looking for a suitable place to add in new structure. UNIMEM UNIMEM • Basic clustering principle: • Add new nodes into tree as & when they appear to be warranted by the presented instances. • Instance matches a node if it is covered by that node (concept) • UNIMEM actually stores each presented instance • Matching determined by testing to see what proportion of at all nodes which cover it. the instance's attributes are associated with the node. • Search process returns all the most specific nodes that explain • If two instances stored at a node that are (cover) the new instance. particularly similar - then create an extra child • UNIMEM then generalizes each node in this set as necessary node whose definition covers the two instances in in order to account for the new instance. question. • The new instance is then classified with all other instances • Two instances are then relocated to this node. stored at the node. • As new instances are processed - new nodes are created & hierarchy grows downwards.
  4. 4. UNIMEM Algorithm UNIMEM as Memory • UNIMEM actually stores new instances inside the tree. • Initialize decision tree to be an empty root node. • Can thus be viewed as a type of memory. • Apply following steps to each instance: • GBM - Generalisation-Based Memory • Search the tree depth-first for most specific concept • Structure of hierarchy enables classes of instances to be nodes that the instance matches. accessed much more efficiently than would be the case • Add new instance to the tree at or below these nodes if all instances were stored in a linear memory • Involves comparing new instance to ones already structure. stored there & creating new subnodes if appropriate. COBWEB COBWEB • Incremental system for hierarchical conceptual • Fisher - 1987 clustering • Based on principle that a good clustering should • Carries out hill-climbing search through a space of minimize distance between two points within a cluster & hierarchical classification schemes using operators maximize distance between points in different clusters. which enable bidirectional travel through this space. • Good clustering defined as: • Features of COBWEB: • One which maximizes intra-cluster similarity & • Heuristic evaluation function to guide search. minimizes inter-cluster similarity. • State representation - structure of hierarchies & representation of concepts. • Goal of COBWEB - to find optimum tradeoff between these two ! • Operators used to build classification schemes • Control strategy.
  5. 5. Category Utility Representation • Can be viewed as a function which rewards • Choice of category utility as heuristic measure dictates a similarity of objects within same class & concept representation different to logical, typically dissimilarity of objects in different classes. conjunctive representations used in AI. • Probabilistic representation of {fish, amphibian, mammal} • Gluck & Corter - 1985 Attributes Values & Probabilities • Category utility function: body-cover scales (0.33), moist-skin (0.33), hair (0.33) n heart-chamber two (0.33), three (0.33), four (0.33) ∑k=1 P(Ck) [ ∑i ∑j P(Ai = Vij/Ck)2 - ∑i ∑j P(Ai = Vij)2 ] body-temp unregulated (0.67), regulated (0.33) n fertilisation external (0.67), internal (0.33) • Each node in the classification tree is a probabilistic concept which represents an object class & summarises the objects classified under the node. Operators Operators contd ... • Classifying object in existing class • Incorporation of a new object into the tree is a process of • To determine which category best "hosts" a new object, classifying an object by descending the tree along an COBWEB tentatively places the object in each category. appropriate path & performing one of several operations at • Partition which results from adding object to a given node each level. is evaluated using category utility function. • Operators include: • Node which results in the best partition (highest CU) is • Classifying object with respect to an existing class. identified as the best existing host for the new object. • Creating a new class. • Creating a new class • Combining two classes into a single class. • Quality of the partition resulting from placing the object • Dividing a class into several classes. in the best existing host is compared to partition resulting from creation of a new singleton class containing the object. • Depending on which partition is best - object is placed in the best existing class or a new class is created.
  6. 6. Example Operators contd ... • Add "mammal": P(C0) = 1.0 • While the first two operators are effective in many P(scales | C0) = 0.33 ... ways - by themselves they are very sensitive to ordering of input data. P(C0) = 1.0 P(C1) = 0.33 P(scales | C1) = 1.0 P(C2) = 0.33 P(moist | C2) = 1.0 P(C3) = 0.33 P(hair | C3) = 1.0 • Merging & splitting operators implemented to guard P(scales | C0) = 0.5 ... ... ... against these effects. ... • Merging P(C1) = 0.5 P(C2) = 0.5 • Add "bird": • Two nodes of a level are combined in hope that P(scales | C1) = 1.0 P(moist | C2) = 1.0 ... ... P(C0) = 1.0 the resultant partition is of better quality. P(scales | C0) = 0.25 ... • Involves creating a new node Existing Classification Structure P(C1) = 0.25 • Two original nodes are made children of newly P(C2) = 0.25 P(C3) = 0.5 P(scales | C1) = 1.0 P(moist | C2) = 1.0 P(hair | C3) = 0.5 created node. ... ... ... • Splitting P(C4) = 0.5 P(C5) = 0.5 • Node may be deleted and its children promoted. P(hair | C4) = 1.0 P(feath | C5) = 1.0 ... ... Merging & Splitting COBWEB Control Operators Structure P COBWEB ( Object , Root of classification tree ) P 1. Update counts of the Root • Node Merging New node 2. IF Root is a leaf A B THEN Return the expanded leaf to accommodate Object ELSE Find the child of Root which best hosts Object & perform A B one of the following: a. Consider creating a new class & do so if appropriate b. Consider node merging & do so if appropriate, call P COBWEB ( Object, Merged node ) P c. Consider node splitting & do so if appropriate, call • Node Splitting COBWEB ( Object, Root ) A d. IF None of the above were performed B THEN Call COBWEB ( Object, Best child of Root ) A B
  7. 7. AutoClass • Cheeseman et al - 1988 • Bayesian statistical technique • Bayes' theorem - formula for combining probabilities • Technique determines: • Most probable number of classes • Their probabilistic descriptions • Probability that each object is a member of each class • AutoClass does not do absolute partitioning of data into classes. • Calculates the probability of each object's membership in each class.